Predicting Hearing Loss in Testicular Cancer Patients after Cisplatin-Based Chemotherapy

Simple Summary To our knowledge, this is the first study that presents a machine learning setup incorporating genetics and clinical factors to predict hearing loss in a large and fairly unique cohort of testicular cancer patients, with follow-up data examining long-range side effects of chemotherapy. Genetic variants in SOD2 and MGST3 are proposed as mechanistically associated with cisplatin-induced hearing loss. Further, the models in this study focus on individual patient benefit and incorporation of quality of life measures to identify hearing loss impact. To study short- and long-term effects of chemotherapy, testicular cancer is ideal as a model disease for other cancers, as patients are young with long life-expectancy and without significant comorbidity. With small adjustments, the model can likely be applied in the treatment of other cancers where cisplatin is used, thus helping with choice of treatment without risking a trade-off in efficacy, standing to influence clinical practice. Abstract Testicular cancer is predominantly curable, but the long-term side effects of chemotherapy have a severe impact on life quality. In this research study, we focus on hearing loss as a part of overall chemotherapy-induced ototoxicity. This is a unique approach where we combine clinical data from the acclaimed nationwide Danish Testicular Cancer (DaTeCa)-Late database. Clinical and genetic data on 433 patients were collected from hospital files in October 2014. Hearing loss was classified according to the FACT/GOG-Ntx-11 version 4 self-reported Ntx6. Machine learning models combining a genome-wide association study within a nested cross-validated logistic regression were applied to identify patients at high risk of hearing loss. The model comprising clinical and genetic data identified 67% of the patients with hearing loss; however, this was with a false discovery rate of 49%. For the non-affected patients, the model identified 66% of the patients with a false omission rate of 19%. An area under the receiver operating characteristic (ROC-AUC) curve of 0.73 (95% CI, 0.71–0.74) was obtained, and the model suggests genes SOD2 and MGST3 as important in improving prediction over the clinical-only model with a ROC-AUC of 0.66 (95% CI, 0.65–0.66). Such prediction models may be used to allow earlier detection and prevention of hearing loss. We suggest a possible biological mechanism for cisplatin-induced hearing loss development. On confirmation in larger studies, such models can help balance treatment in clinical practice.


Introduction
Testicular cancer is the most common cancer in men below 40 years of age in developed countries with a continuously rising incidence in many countries [1]. It is a highly curable disease with a 5-year survival of more than 90% disregarding initial stage, which results in

Source of the Data
Long-term TCS were identified in the DaTeCa-Late cohort [13] with patients initially treated for testicular cancer in Denmark from 1984 to 2007. All patients in this cohort filled in a range of questionnaires related to late toxicity from January 2014 to December 2016 (n = 2572). Clinical features were identified in hospital files [14]. In October 2014, 433 of these TCS who had received one line of treatment were asked to deliver a saliva sample for genotyping, as previously described [15].
Patients gave informed consent to participate in this study, and the study was approved by the regional ethical committee (File number, H-2-2012-044) and the National Board of Data Protection (File number, 2012-41-0751).

Treatment and Clinical Information
All patients received bleomycin-etoposide-cisplatin (BEP) for disseminated testicular cancer, for three cycles or more, as previously described [15]. The majority received cisplatin 20 mg/m 2 and etoposide 100 mg/m 2 for five days each cycle, while 43 (10%) received double-dose cisplatin (40 mg/m 2 ) and etoposide (200 mg/m 2 ) as part of a research protocol. Bleomycin was administered at a dose of 15.000 IE/m 2 with a cumulative maximum dose of 150.000/m 2 .
Clinical information consisted of age at diagnosis and at time of completion of the questionnaire, body mass index (BMI), glomerular filtration rate before treatment, cumulative cisplatin dose per square meter of body surface area (BSA), number of BEP cycles, histology (seminoma vs. non-seminoma), prognostic classification as per International Germ Cell Cancer Collaborative Group (IGCCCG) [16], alcohol consumption (units/week), and smoking habits (never; former; or current). BMI, alcohol, and smoking information were collected at the time of the questionnaire. Age at time of completion of the questionnaire was correlated with age at diagnosis (Pearson correlation 0.76) and omitted for further analysis.

Assessment of Hearing Loss
Self-perceived hearing loss was assessed with the Ntx subscale of the FACT/GOG-Ntx-11, version 4, which evaluates the severity and impact of neuropathy [17]. The questionnaire consists of 11 items rated from 0 (not at all) to 4 (very much). The scale can be divided into four subscales: sensory neuropathy, motor neuropathy, auditory neuropathy, and dysfunctional problems [17]. Auditory neuropathy comprises two different questions, where FACT/GOG-Ntx6 measures difficulty hearing, and FACT/GOG-Ntx7 measures tinnitus (Supplementary Note S1).
Here, we aim at predicting hearing loss; thus, only FACT/GOG-Ntx6 is further explored. FACT/GOG-Ntx6 and FACT/GOG-Ntx7 were not strongly correlated, which may indicate different biological etiologies. For FACT/GOG-Ntx6, to ensure clinical relevance, the outcome was dichotomized. Low-risk (score from 0 to 1) and high-risk groups (score from 2 to 4) were considered.
It is important to point out that the FACT/GOG-Ntx questionnaire was completed in 2014, and patients had answered FACT/GOG-Ntx6 according to their current experience of hearing levels. However, at that time, the patients were also asked if they recalled experiencing worse hearing during treatment (hearing change question 1, HC Q1), and whether it returned to normal afterwards (hearing change question 2, HC Q2). Even though HC Q1 and HC Q2 are not validated at the same level as FACT/GOG-Ntx [17], we used Spearman's rank correlation between FACT/GOG-Ntx6 and HC Q1 and HC Q2 to understand if the reported patients' hearing loss at the time of the FACT/GOG-Ntx questionnaire was due to cisplatin treatment.
Genotyping data were converted into pedigree format using GenomeStudio ® (v2011.1) with PLINK Input Report Plug-in (v2.1.3). Variants with genotyping call rate < 0.98, not in Hardy-Weinberg equilibrium (p value < 5 × 10 −6 ), or with a minor allele frequency < 0.01 were excluded. Quality control for both single nucleotide polymorphisms (SNPs) and patient samples is described in detail in Supplementary Figure S1.

Logistic Regression with Cross-Validated GWAS
A nested five-outer, five-inner cross-validation logistic regression was implemented using SciKit-learn [28] (v0.23.2) in Python (v3.6.10). As performance was similar across other machine learning models (random forests and artificial neural networks), the more simplistic logistic regression was chosen to be closest to interpretability and eventual implementation.
Forward feature selection and parameter optimization were performed in the inner training-validation sets, and the model was deployed on the outer test sets. Initially, only clinical data were included in the model. The area under the ROC-AUC was used to evaluate the model's prediction ability. An increasing number of clinical features was evaluated in exhaustive combinations until the ROC-AUC reached a plateau.
The genetic data were then added to the model. A cross-validated GWAS was performed on the inner training sets to select SNPs for model training. Genetic variants were tested for association with hearing loss using logistic regression (PLINK [29] (v1.9)) adjusting for potential confounding effects: age at time of questionnaire and cisplatin dose. A suggestive p value threshold of 1 × 10 −4 was used to select SNPs for model training (Supplementary Figure S2). Then, forward feature selection was performed on the combined dataset comprising both SNPs identified through GWAS and a systematic review search, along with the clinical data. SHapley Additive exPlanations (SHAP) values [30] helped interpret the impact of individual features contributing to the model's performance.
The dataset was randomly split 30 different times in training, validation, and test sets to ensure model reproducibility and robustness. More information on model hyperparameters, encoding of variables, and feature normalization is included in Supplementary Note S3.
For the model with clinical data only, permutation tests were applied to ensure the model was not fitting random noise. For the model with clinical and genetic data, this was achieved by adding randomly selected SNPs.
In this study, we used the FACT/GOG-Ntx-11 version 4, which provides a targeted assessment of peripheral neuropathy such as auditory neuropathy. Auditory neuropathy comprises two different questions, where FACT/GOG-Ntx6 measures difficulty hearing, and FACT/GOG-Ntx7 measures tinnitus (Supplementary Note S1). A moderate correlation was observed between FACT/GOG-Ntx6 and FACT/GOG-Ntx7 (Spearman's rank correlation coefficient 0.55). Additionally, the patients were asked if they recalled experiencing worse hearing during treatment (hearing change question 1, HC Q1) and whether it returned to normal afterwards (hearing change question 2, HC Q2). In order to understand if the patient's hearing loss at the time of the FACT/GOG-Ntx questionnaire (2014) was due to cisplatin treatment (between 1984 and 2007), we investigated the correlation between these questions as well. FACT/GOG-Ntx6 from the validated FACT/GOG-Ntx questionnaire showed a high correlation with HC Q2 concerning self-perceived long-lasting changes after treatment (Spearman's rank correlation coefficient 0.56 for HC Q1 and 0.76 for HC Q2).
First, the prediction ability of the routinely available clinical information was assessed. Nine features were incrementally included in the model through exploring exhaustive permutations with each single feature addition. The two most informative clinical features (receiver operating characteristic curve ROC-AUC of 0.66 (95% CI, 0.65-0.66), Figure 1A,C)-age at diagnosis and number of treatment cycles-were prioritized for further modeling and combined with genetic data from the SNP array chip.    Prediction performance, assessed as ROC-AUC, reached a plateau when six genetic features were added to the model (in addition to the two most informative clinical parameters), with a mean ROC-AUC of 0.73 (95% CI, 0.71-0.74) ( Figures 1B,D and 2A). The most informative SNPs were: SOD2 rs4880, MGST3 rs9333378, intergenic rs4389005, ABCA10 rs10491178, ABCA12 rs10498027, MCM8 rs3761873 ( Table 3). Out of 30 models, these SNPs were selected 15, 9, 7, 6, 5, and 4 times, respectively (Supplementary Table S1). Only the intergenic rs4389005 has been pre-selected from the cross-validated GWAS. All other SNPs had a p value > 1 × 10 −4 and were pre-selected from a systematic review of genes shown to be related with cisplatin metabolism or ototoxicity. The two most influential SNPs according to SHAP metrics [30] were SOD2 rs4880 and MGST3 rs9333378 (Figure 3). Homozygous genotypes for the risk alleles SOD2 rs4880:AA and MGST3 rs9333378:AA were found in 47% of patients who replied FACT/GOG-Ntx6 = 0 or 1, 63% of patients who replied FACT/GOG-Ntx6 = 2, and 76% of patients who replied FACT/GOG-Ntx6 = 3 or 4 (chi-squared p value 1 × 10 −4 ).
For each sample, prediction scores ranged between 0 and 1, where a value closer to 1 indicated a higher probability of hearing loss. Using a default cut-off of 0.50, a sensitivity of 67% was reached and a positive predictive value of 51%. Correspondingly, this resulted in a specificity of 66% and a negative predictive value of 80% ( Figure 2B). The model performed best on patients with the highest toxicity ( Figure 2C). For each sample, prediction scores ranged between 0 and 1, where a value closer to 1 indicated a higher probability of hearing loss. Using a default cut-off of 0.50, a sensitivity of 67% was reached and a positive predictive value of 51%. Correspondingly, this resulted in a specificity of 66% and a negative predictive value of 80% ( Figure 2B). The model performed best on patients with the highest toxicity ( Figure 2C). For most patients (320 out of 393), adding genetic data improved hearing loss prediction; however, for 42 out of 320, this was still not enough to correctly classify these patients. In 7 out of 393 patients, the addition of genetic data led to misclassification. For 55 out of 393 patients, neither clinical nor genetic data helped on prediction and/or classification (Supplementary Figure S3).   Overall, we were able to improve prediction performance when adding genetic features to clinical data (ROC-AUC 0.73) compared to the models with only clinical data (ROC-AUC 0.66).
To test robustness and non-randomness of the selected models, in the models with only clinical data, all variables were permutated, which led to a mean ROC-AUC close to 0.50 throughout the forward feature selection (Supplementary Figure S4A). The mean ROC-AUC for the random model with two features was 0.50 (95% CI, 0.49-0.51) (Supplementary Figure S4B).
In an additional test, random genetic variants were added to the model with the informative clinical features (age at diagnosis and number of treatment cycles). Mean ROC-AUC was 0.67 (95% CI, 0.66-0.68) for the model with six random genetic variants and two informative clinical features (Supplementary Figure S4D), which was not so different from the ROC-AUC with two clinical features only (ROC-AUC of 0.66 (95% CI, 0.65-0.66)), in- For most patients (320 out of 393), adding genetic data improved hearing loss prediction; however, for 42 out of 320, this was still not enough to correctly classify these patients. In 7 out of 393 patients, the addition of genetic data led to misclassification. For 55 out of 393 patients, neither clinical nor genetic data helped on prediction and/or classification (Supplementary Figure S3).
Overall, we were able to improve prediction performance when adding genetic features to clinical data (ROC-AUC 0.73) compared to the models with only clinical data (ROC-AUC 0.66).
To test robustness and non-randomness of the selected models, in the models with only clinical data, all variables were permutated, which led to a mean ROC-AUC close to 0.50 throughout the forward feature selection (Supplementary Figure S4A). The mean ROC-AUC for the random model with two features was 0.50 (95% CI, 0.49-0.51) (Supplementary Figure S4B).
In an additional test, random genetic variants were added to the model with the informative clinical features (age at diagnosis and number of treatment cycles). Mean ROC-AUC was 0.67 (95% CI, 0.66-0.68) for the model with six random genetic variants and two informative clinical features (Supplementary Figure S4D), which was not so different from the ROC-AUC with two clinical features only (ROC-AUC of 0.66 (95% CI, 0.65-0.66)), indicating that the random genetic variants were indeed not adding any relevant information for the prediction. From this point on, ROC-AUC started to steadily decrease as more randomly selected SNPs were added to the models (Supplementary Figure S4C).

Discussion
In this study, we present a model for the prediction of hearing loss after cisplatincontaining chemotherapy based on a combination of clinical and genetic features, achieving a classification performance of ROC-AUC 0.73. We observed an improved prediction after the inclusion of genetic data compared to clinical data only. Age at diagnosis and number of treatment cycles were the most important clinical predictors, matching what has previously been reported [7,9,31].
We have focused on hearing loss as part of ototoxicity, as we did not observe a strong correlation between hearing loss (FACT/GOG-Ntx6), and tinnitus (FACT/GOG-Ntx7), which may indicate independent biological mechanisms. Indeed, not all people who suffer from hearing loss have tinnitus, and vice versa, and studies on the genetics behind tinnitus are still at an early stage [32,33].
The first SNP selected in the model, the functional rs4880 SNP, is located on codon 16 exon 2 of SOD2 that codes for the superoxide dismutase 2 (SOD2), a mitochondrial protein [34]. SNP rs4880 is the most studied SOD2 SNP [35]; however, there is no agreement regarding how it influences SOD2 enzymatic activity. SNP rs4880 (A > G, Val16Ala) is predicted to change the structure of the SOD2 mitochondrial targeting sequence, converting a beta-sheet secondary structural motif to a partial alpha-helix [36]. Some state that due to partial arrest of the beta-sheet structure during transport across the inner mitochondrial membrane, this will likely inhibit efficient mitochondrial import of Val-SOD2 precursors and, thus, reduce enzyme activity [37]. A follow-up study has reported that the Ala variant, associated with increased SOD2 activity according to the previously mentioned study, was associated with hearing damage in cisplatin-treated pediatric medulloblastoma [38]. However, others have measured SOD2 activity and observed that it was lower in SOD2 rs4880:GG individuals compared with SOD2 rs4880:AA, or SOD2 rs4880:GA [39].
The second SNP selected in the model, SNP rs9333378, is located in MGST3, that codes for the microsomal glutathione S-transferase 3 (MGST3) [34]. Among the microsomal glutathione S-transferases, MGST1, MGST2, and MGST3 have been reported to be important in the detoxification process [40].
Here, we hypothesized a combined effect of SOD2 rs4880 and MGST3 rs9333378 on cisplatin-induced hearing loss.
When platinum enters the cells, it is metabolized by the mitochondria, which will lead to the production of reactive oxygen species (ROS) such as superoxide. SOD2 will then degrade superoxide into hydrogen peroxide until complete superoxide anion degradation. If SOD2 is prevented from entering the mitochondria due to partial arrest of beta-helix, this may lead to an accumulation of ROS. ROS cause lipid peroxidation, activation of pro-inflammatory factors, and cell death by apoptosis, including hair cells [41,42]. Indeed, we observed the A-allele with a higher frequency in patients who reported hearing loss (odds ratio = 1.55, 95% CI: 1. 13-2.13), contrary to what has been reported previously [38]. Furthermore, glutathiones, including glutathione S-transferase, are known to help with complete superoxide anion degradation [38]. The Genotype-Tissue Expression (GTEx) database [43] reports lower MGST3 expression levels for rs9333378:AA compared to the rs9333378:GG genotype in the brain. It is hypothesized that the rs9333378 variant leads to accumulation of cisplatin in the hair cells through decreased MGST3 activity.
Additionally, potential novel variants associated with cisplatin-induced hearing loss were selected on the logistic regression model. SNP rs4389005 located in an intergenic region was found in the cross-validated GWAS. The closest gene is GPR12 (64 kilo base pairs 5 to canonical transcription start site), a G protein-coupled receptor (GPCR). GPCRs have been seen to be involved in several physiological and pathological functions [44]. The subsequent SNPs, found via systematic review search, and with contribution to model performance, were SNPs ABCA10 rs10491178 and ABCA12 rs10498027, both leading to stop-gains within the ABCA genes which encode ATP-binding cassette (ABC) transporters. Overexpression of ABC transporters have been associated with multidrug resistance, including cisplatin, in multiple tumors [45]. ABCA10 rs10491178:GG has been associated with lower expression of ABCA6 [43], which can lead to higher sensitivity to cisplatin and higher toxicity [46]. MCM8 rs3761873 was the last SNP selected by the model, which leads to a stop-gain. MCM8 encodes the mini-chromosome maintenance 8 homologous recombinant repair factor protein (MCM8), and in a recent mouse study, inhibition of MCM8 (and MCM9) hypersensitized cells to cisplatin [47].
While we observed a false discovery rate of 49% using a 0.50 cut-off, it is promising to see that only four of the twenty-three patients with the highest score (FACT/GOG-Ntx6 = 4) were misclassified. Three of them had a prediction score very close to 0.50 (two patients with 0.48 and one with 0.49 prediction scores). The last misclassified patient had a prediction score of 0.31 and was also the youngest of the 23. Furthermore, he received one of the lowest amounts of cisplatin (300 mg/m 2 and three treatment cycles). This points to other relevant features that led this patient to develop hearing loss, either clinical or genetic predispositions that might be underrepresented in this dataset and, hence, may not have been detected.
The diagnosis of hearing loss is challenging to perform, and its definition is still far from being robustly defined [48]. Here, several potential factors for hearing loss were not explored, such as noise, infection, or vascular problems, and the toxicity was assessed several years after exposure. However, long-term toxicity also has a high impact on quality of life [9] and may be important to predict for balancing treatment intensity.
The models were trained on labels that derive from the FACT/GOG-Ntx questionnaire, which are not objectively quantified. Other measurements, such as pure-tone audiometry, which are not yet implemented routinely in clinical practice, could have been undertaken to improve precision [48]. On the other hand, using quality of life measures ensures that the focus is on the patient [49]. For instance, objective measurements might detect a similar level of toxicity between two individuals; however, only one may be affected by the symptoms and, thus, objective measurements may not be a true assessment of quality of life.
Further, BMI, as well as information about alcohol consumption and smoking habits, were retrieved in 2014 when the questionnaire was completed. These clinical features were used as a proxy at the time of treatment, but they may not represent the true values. While those features were not selected in the final model, we are unaware if the real values at the time of treatment could have added relevant information to the model. Incorporating longitudinal data, such as information collected one year after treatment, could also have been advantageous in further improving the model's performance.
Finally, models in this study were trained on 393 patients adhering to most of the best practices of healthcare-related prediction models [50] using a logistic regression with cross-validated GWAS; nonetheless a future replication on a larger and independent patient cohort would be warranted.
Cisplatin is essential in treatment of several neoplasms; however, the inability to predict how patients will react to chemotherapy represents a major challenge, and hearing loss is one of the most common late side effects of cisplatin-based chemotherapy. In this study, we present a logistic regression with cross-validated GWAS prediction model based on a combination of genetic and clinical features able to classify patients at high (67% sensitivity), or low risk (66% specificity) of hearing loss after cisplatin-based treatment.
We also propose a combined effect involving SOD2 rs4880 and MGST3 rs9333378 on cisplatin-induced hearing loss development. In our study, these SNPs have not yielded significant results when single associations between SNPs and outcome have been performed; thus, a combination of cross-validated GWAS and systemic review search is suggested as a feature selection approach for machine learning.
Following confirmation in a prospective clinical setting and replication in larger independent studies, such a model could be used as a complement to support clinical decision-making and help in reducing hearing loss cases by adjusting treatment for patients in the high-risk group, especially with treatment of other cancers where cisplatin is used.  Figure S1: Step-by-step demonstration of genetic data quality control and information on patients where questionnaire information was missing. Single-nucleotide polymorphism (SNP) quality filtering included removal of duplicated SNPs and those with ambiguous genome position, strand, and alleles; call rate (<98%); extreme deviation from Hardy-Weinberg equilibrium (p value < 5 × 10 −6 ); and MAF (<1%). Quality controls applied on the patient samples were based on genotype (chromosome X homozygosity rate <20% for females and >80% for males) and phenotype sex discordance; extreme heterozygosity or homozygosity (±4 SD from sample's hetero-/homozygosity rate mean); outliers from the European descent using 1000 Genomes(49) as reference samples; cryptic relatedness (IBD > 18.75%); and population outliers (±4 SD from cluster centroid mean). European outliers were detected by (1) doing principal component analysis (PCA) to find the center of the European reference samples, and (2) remove samples whose Euclidean distance from the center > 1.5 * maximum Euclidean distance of the European reference samples (50). Patients with missing questionnaire information consisted of 45 patients who received more than one line of treatment and therefore were not relevant for the present study and were not invited for the questionnaire in 2014. These were still included for the purpose of quality control only. Supplementary Figure S2: Illustration of logistic regression model used in this study. Model was run at Computerome 2.0 (https://www.computerome.dk, accessed on 12 November 2022 (developed for a range of time)). The 30 random data splits were run in parallel to reduce running time, thus 32 nodes were used (30 allocated for each random split and 2 for other initializations). Each node contains 2 CPUs with 20 cores/CPU. 192 GB is the memory distributed through all cores. Supplementary Figure S3: Misclassified patients and/or patients where genetic "pushed" final classification in the wrong direction. Arrow starts at prediction score of model with only clinical data (model 1) and ends at prediction score of model with clinical and genetic data (model 2). A: Inclusion of genetic data helped but not enough to correctly classify these patients; B: Patients where genetic data "pushed" the classification in the wrong direction, even though some of them were correctly classified; C, D: Neither clinical nor genetic data helped on these patients classification (in D, score difference between model 2 and 1 was below 0.05). All other patients not represented here were correctly classified and genetic data "pushed" the classification in the right direction (or if in the wrong direction, score difference between model 2 and model 1 was below 0.05). Supplementary Figure S4