Validating Methylated HOXA9 in Bronchial Lavage as a Diagnostic Tool in Patients Suspected of Lung Cancer

Simple Summary Diagnosing lung cancer requires invasive procedures with high risk of complications. Methylated tumor-specific DNA has been suggested as a biomarker for lung cancer. The present study aimed to develop and validate the biomarker methylated HOXA9 in fluid from the lung collected during bronchoscopy. This biomarker has a clinically relevant sensitivity and specificity for the diagnosis of lung cancer. Future research should focus on determining the optimal combination of biomarker and biologic specimen. Abstract Diagnosing lung cancer requires invasive procedures with high risk of complications. Methylated tumor DNA in bronchial lavage has previously shown potential as a diagnostic biomarker. We aimed to develop and validate methylated HOXA9 in bronchial lavage as a diagnostic biomarker of lung cancer. Participants were referred on suspicion of lung cancer. Ten mL lavage fluid was collected at bronchoscopy for analysis of methylated HOXA9 based on droplet digital PCR according to our previously published method. HOXA9 status was compared with the final diagnosis. The Discovery and Validation cohorts consisted of 101 and 95 consecutively enrolled participants, respectively. In the discovery cohort, the sensitivity and specificity were 73.1% (95% CI 60.9–83.2%) and 85.3% (95% CI 68.9–95.0%), respectively. In the validation cohort, the values were 80.0% (95% CI 66.3–90.0%) and 75.6% (95% CI 60.5–87.1%), respectively. A multiple logistic regression model including age, smoking status, and methylated HOXA9 status resulted in an AUC of 84.9% (95% CI 77.3–92.4%) and 85.9% (95% CI 78.4–93.4%) for the Discovery and Validation cohorts, respectively. Methylated HOXA9 in bronchial lavage holds potential as a supplementary tool in the diagnosis of lung cancer with a clinically relevant sensitivity and specificity. It remained significant when adjusting for age and smoking status.


Introduction
Lung cancer is considered the deadliest cancer worldwide [1] and is often diagnosed at a late stage [2][3][4]. Screening with low-dose computed tomography (CT) scans is recommended in the USA and some European countries based on the results from large screening trials, including the US-based National Lung Screening Trial (NLST) [5] and the Dutch-Belgian Nederlands-Leuvens Longkanker Screenings Onderzoek (NELSON) [6]. These studies concluded that there was a reduction in mortality of at least 20% in the CT-screened cohort. The Danish lung cancer screening trial found no significant reduction in mortality but reported a significantly larger fraction of early-stage cancers in the screening group [7].
CT scans produce a large number of false positive findings. The NLST reported a false-positive rate of 26.6% [5]. The NELSON trial reported a false-positive rate of 1.2% but approximately 20% of all tests at baseline were deemed indeterminate and required a repeated CT scan [6]. According to the Lung-RADS guidelines [8], patients with solid nodules ≥8 mm should be followed up with CT scans every three months. Such an initiative would cause great strain on departments of radiology and pulmonary medicine, and a screening program in general represents a significant economic burden.
Additional diagnostic criteria to help confirm or reject a cancer diagnosis could improve cost-benefit and reduce workload [9,10]. Risk prediction models, such as the PLCO M2012 , showed better sensitivity without loss of specificity compared to the NLST selection criteria [11]. Risk prediction models could be further improved by including biomarkers [12].
Circulating tumor-specific DNA (ctDNA) has been suggested as a promising biomarker for diagnosing lung cancer. However, studies have indicated that small or early-stage lung tumors do not shed as much DNA into the circulation as larger tumors and hence are more difficult to detect in the blood [13,14]. This could be overcome by using material collected closer to the tumor site, as this material likely contains more tumor DNA. This idea was indicated in previous studies comparing tumor DNA detected in blood and sputum [15,16] to tumor DNA detected in bronchial washings [17].
Aberrant methylation of the promoter region can affect gene expression and has been linked to cancer development [18]. Hypermethylation usually inhibits gene transcription, while hypomethylation increases transcription [18]. The homeobox A9 (HOXA9) gene encodes a DNA-binding transcription factor [19]. HOXA9 has been shown to be dysregulated in many solid tumors, including lung cancer [20], and in vitro experiments have found downregulation of HOXA9 to enhance migratory potential [21] and stimulate cell invasiveness [22]. Hypermethylated HOXA9 has been suggested as a diagnostic biomarker in lung cancer [15,23], and our group has shown that hypermethylated HOXA9 is a negative prognostic biomarker in advanced lung cancer [24].
The objectives of this study were to develop and validate the use of methylated HOXA9 in bronchial lavage fluid as a diagnostic biomarker of lung cancer. We hypothesize that methylated HOXA9 in bronchial lavage fluid can serve as a valuable adjunct in the diagnosis of lung cancer.
We conclude that methylated HOXA9 in bronchial lavage holds potential as a supplementary tool in the diagnosis of lung cancer because it has a clinically relevant sensitivity and specificity.

Participants and Study Design
Participants were prospectively enrolled in this observational study at the Department of Medicine, Vejle Hospital, University Hospital of Southern Denmark, from October 2018 to December 2019. The first 107 participants were allocated to the Discovery cohort, and the subsequent 100 participants were allocated to the Validation cohort. Inclusion criteria were referral for diagnostic work-up, including bronchoscopy, on suspicion of lung cancer and age > 18 years. The exclusion criterion was severe comorbidity preventing the participant from completing the planned follow-up procedures. For the Validation cohort, a previous diagnosis of lung cancer was introduced as an additional exclusion criterion. Participants were followed for at least six months after enrolment.
The study was performed according to the Declaration of Helsinki of 1975 and the Danish data protection legislation. All participants provided written, informed consent before enrolling in the study. The study protocol was approved by the Regional Committee for Health Research Ethics of Southern Denmark (No. S-20180052, 22 June 2018).

Definition of Patient Characteristics
Employment status was evaluated from the medical record. If the participant was older than 65 years, he/she was considered retired unless otherwise specified in the medical record. Smoking status was categorized as never if the participant had smoked less than one pack year in his/her lifetime and as ever if he/she had smoked more than one pack year. One pack year was defined as 20 cigarettes per day for a year or the equivalent in other tobacco types. Performance status was categorized as defined by the Eastern Cooperative Oncology Group. FEV1 was the forced expiratory volume of air in 1 s recorded in liter. Any comorbidity was defined as the participant having any condition that required regular medication. Cancer within five years was defined as any diagnosis of malignancy within the past five years excepting non-melanoma skin cancer and carcinoma in situ cervix uteri, while previous lung cancer was defined as any previous diagnosis of lung cancer. All clinical participant characteristics were recorded at the first doctor's appointment.

Diagnostic Work-Up
The standard diagnostic work-up on suspicion of lung cancer consisted of a CT scan of the chest and abdomen and, depending on the result, further investigations were initiated. These included: blood samples, full-body positron-emission tomography (PET) CT scan, bronchoscopy with endobronchial ultrasound (EBUS)-guided fine-needle aspiration, or CT-guided transthoracic needle aspiration. The results of these investigations were evaluated by a multidisciplinary tumor board consisting of doctors specializing in pulmonary medicine, radiology, nuclear medicine, pathology, thoracic surgery, and clinical oncology.

Bronchoscopy and Bronchial Lavage Sampling
Bronchoscopy was performed under general anesthesia with either a BF-1TH190 or BF-H190 Olympus video bronchoscope (Olympus, Shinjuku City, Tokyo, Japan). The bronchoscope was introduced through the tracheal tube. In the case of a visible lesion, the bronchoscope was positioned as closely as possible to the tumor, and, subsequently, 10 mL of sterile saline was instilled and retrieved. In the case of no visible lesion, the bronchoscope was positioned as closely as possible to the tumor site, as identified on the CT scan, and bronchial lavage was performed. The project samples were collected after lavage samples had been taken for cytological and microbiological examination. Biopsies were taken either directly from visible lesions or guided by endobronchial ultrasound.
We aimed to collect a sample volume of 10 mL because this is the standard volume used for clinical examinations. The recovery of bronchial lavage fluid ranged from 3-7 mL.

Reference Test
The gold standard reference in this study was a histopathology-confirmed diagnosis of lung cancer, as agreed upon by the tumor board. Patients with a histopathology-confirmed diagnosis of malignancy other than lung cancer were excluded from this study ( Figure 1). Patients with a suspicious nodule on CT scan were categorized as controls if they did not have histopathology-confirmed lung cancer after six months.

Analysis of Methylated HOXA9
Bronchial lavage fluid was centrifuged at 300× g for 10 min, and the supernatant was frozen at −80 • C pending further analysis. DNA was extracted from 2 mL bronchial lavage fluid with the DSP Circulating DNA kit (Qiagen, Hilden, Germany) as recommended by the manufacturer. The method of analysis of methylated HOXA9 has been published previously [25][26][27]. Briefly, DNA was bisulfite converted using the EZ DNA Methylation-Lightning Kit (Zymo Research, Irvine, CA, USA) according to the manufacturer's instructions. The bisulfite-converted DNA was analyzed for methylated HOXA9 by an in-house methylation-specific droplet digital polymerase chain reaction (ddPCR) assay and read on a QX100 Droplet Digital Reader (Bio-Rad, Hercules, CA, USA). Several checkpoints ensured optimal performance of the assay. Spike-in of CPP1 served as an internal control of DNA extraction efficacy [28], and the ¦Beta2 microglobulin gene was used as a surrogate for the total amount of cell-free DNA before bisulfite conversion. This checkpoint was a quality control of the DNA extraction step. The methylation-specific ddPCR assay included water as a negative control, a pool of lymphocyte DNA from healthy donors as a non-cancer control, and Universal Methylated DNA Standard (Zymo Research, Irvine, CA, USA) as a methylated control. The primer and probe sequences can be found in an online data supplement. Analysis of methylated HOXA9 was performed blinded to the clinical endpoint.

Analysis of Methylated HOXA9
Bronchial lavage fluid was centrifuged at 300× g for 10 min, and the supernatant was frozen at −80 °C pending further analysis. DNA was extracted from 2 mL bronchial lavage fluid with the DSP Circulating DNA kit (Qiagen, Hilden, Germany) as recommended by the manufacturer. The method of analysis of methylated HOXA9 has been published previously [25][26][27]. Briefly, DNA was bisulfite converted using the EZ DNA Methylation-Lightning Kit (Zymo Research, Irvine, CA, USA) according to the manufacturer's instructions. The bisulfite-converted DNA was analyzed for methylated HOXA9 by an inhouse methylation-specific droplet digital polymerase chain reaction (ddPCR) assay and read on a QX100 Droplet Digital Reader (Bio-Rad, Hercules, California, USA). Several checkpoints ensured optimal performance of the assay. Spike-in of CPP1 served as an internal control of DNA extraction efficacy [28], and the ¦Beta2 microglobulin gene was used as a surrogate for the total amount of cell-free DNA before bisulfite conversion. This checkpoint was a quality control of the DNA extraction step. The methylation-specific ddPCR assay included water as a negative control, a pool of lymphocyte DNA from healthy donors as a non-cancer control, and Universal Methylated DNA Standard (Zymo Research, Irvine, CA, USA) as a methylated control. The primer and probe sequences can be found in an online data supplement. Analysis of methylated HOXA9 was performed blinded to the clinical endpoint.

Determining the Optimal Cut-off for Methylated HOXA9
A volume of 2 mL bronchial lavage fluid had the best performance regarding cellfree DNA yield and the least degree of PCR inhibition.
The limit of blank was set at ≤ 4 droplets containing methylated HOXA9, based on data from 50 healthy donors in a plasma-based assay which was validated on another 50 donors [27]. Methylated HOXA9 was normalized to the albumin gene [29] using the formula: (Methylated HOXA9 copies/albumin copies) * 100 (1) for all samples with ≥ 5 methylated HOXA9 containing droplets. Methylated HOXA9 was normalized to albumin to diminish the effect of the total cell-free DNA level on the results.

Determining the Optimal Cut-Off for Methylated HOXA9
A volume of 2 mL bronchial lavage fluid had the best performance regarding cell-free DNA yield and the least degree of PCR inhibition.
The limit of blank was set at ≤ 4 droplets containing methylated HOXA9, based on data from 50 healthy donors in a plasma-based assay which was validated on another 50 donors [27]. Methylated HOXA9 was normalized to the albumin gene [29] using the formula: (Methylated HOXA9 copies/albumin copies) * 100 (1) for all samples with ≥ 5 methylated HOXA9 containing droplets. Methylated HOXA9 was normalized to albumin to diminish the effect of the total cell-free DNA level on the results. The levels of methylated HOXA9 are illustrated in the Supplementary Materials ( Figure S1). The normalized values were used to establish the optimal cut-off in a receiver operating characteristics (ROC) analysis of the Discovery cohort. The area under the curve (AUC) was 81.5% (95% CI 73.9-89.1%). The optimal cut-off was ≥0.13%, which resulted in 77.2% correctly classified samples and a sensitivity and specificity of 73.1% and 85.3%, respectively. The cut-off was chosen to represent the level of methylated HOXA9 which resulted in the highest number of correctly classified samples (true positive plus true negative samples). This cut-off was then applied to the Validation cohort and used to dichotomize methylated HOXA9 for use in the statistical analyses. Participants with methylated HOXA9 ≥ 0.13% were considered HOXA9-positive in the statistical analyses, while participants with methylated HOXA9 < 0.13% were considered HOXA9-negative.

Statistical Analysis
Categorical variables were reported as fractions (percentages), and continuous variables were reported as median and interquartile ranges (IQR). The Chi-squared test and Fisher's exact test were used to compare categorical values as appropriate. The Wilcoxon rank-sum test was used to test associations between continuous variables. Sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) were calcu-Cancers 2021, 13, 4223 5 of 12 lated for methylated HOXA9 status as a binary biomarker. A multiple logistic regression model was developed using smoking status (never vs. ever smoker) and age as clinical predictors and methylated HOXA9 status as a biomarker. The prediction models were depicted by ROC curves. The models were developed on data from the Discovery cohort and then fitted to data from the Validation cohort. All tests were two-sided; p-values < 0.05 were considered statistically significant. All analyses were performed using STATA 16IC (StataCorp LLC, College Station, TX, USA).

Participant Characteristics
Participants were consecutively and prospectively enrolled from October 2018 to December 2019. The Discovery and Validation cohorts included 101 and 95 participants, respectively, as illustrated in Figure 1. Methylated HOXA9 status for the patients with other cancers is reported in the online data supplement (Table S1). The participant characteristics for cases and controls in both cohorts are summarized in Table 1. Generally, cases were older, less likely to be employed, had greater tobacco consumption and poorer lung function. Note that patients with a previous lung cancer diagnosis were included in the Discovery cohort but not in the Validation cohort. Tumor characteristics and histopathology are reported in Table 2. Not all participants had a visible tumor on the CT scan but were referred for further examination based on symptoms, e.g., hemoptysis. Unsurprisingly, cases tended to have longer tumor diameters than controls; adenocarcinomas comprised the largest proportion of the confirmed lung Cancers 2021, 13, 4223 6 of 12 cancers followed by squamous cell carcinomas, and more than half of patients were diagnosed at stages 3 or 4.

Methylated HOXA9 and Lung Cancer
Methylated HOXA9 measured on bronchial lavage fluid was used as a binary diagnostic biomarker. The full range of diagnostic measures are reported in Table 3. Generally, this biomarker showed better specificity than sensitivity in the Discovery cohort, while the opposite was true for the Validation cohort. However, the confidence intervals overlapped considerably. The diagnostic measures for the Discovery cohort with previous lung cancers excluded can be viewed in the Supplementary Materials, Table S2.

Predictive Modelling
The prediction model was developed based on data from the Discovery cohort. We included only age and smoking status as clinical predictors and methylated HOXA9 status as a biomarker, given that there were only 34 controls in that cohort. The two clinical markers were chosen because they are the most commonly used criteria when selecting participants for lung cancer screening. Univariate logistic regression analyses performed on all participant characteristics can be seen in the online data supplement ( and 85.9% (95% CI 78.4-93.4%, p = 0.003), respectively, for the models with and without HOXA9 status. Please refer to the online data supplement for further information regarding the regression models and for a model on the Discovery cohort excluding previous lung cancers (Table S4). The models were visualized by ROC curves (Figure 2). Table 3. Methylated HOXA9 as a diagnostic biomarker for lung cancer reported for the Discovery and Validation cohorts, respectively. HOXA9+ indicates detectable methylated HOXA9 (≥0.13%) in the bronchial lavage sample and hence a positive test. HOXA9indicates a negative test with no detectable methylated HOXA9 (<0.13%). Sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) are reported with 95% confidence intervals (CI).

Predictive Modelling
The prediction model was developed based on data from the Discovery c included only age and smoking status as clinical predictors and methylated HOX as a biomarker, given that there were only 34 controls in that cohort. The tw markers were chosen because they are the most commonly used criteria when participants for lung cancer screening. Univariate logistic regression analyses on all participant characteristics can be seen in the online data supplement (Ta the Discovery cohort, the clinical regression model had an AUC of 66.6% (95% 78.7%), while the model which included methylated HOXA9 status had an AU (95% CI 77.3-92.4%, p < 0.001). In the Validation cohort, the AUCs were 71.7 61.4-82.1%) and 85.9% (95% CI 78.4-93.4%, p = 0.003), respectively, for the m and without HOXA9 status. Please refer to the online data supplement f information regarding the regression models and for a model on the Discov excluding previous lung cancers (Table S4). The models were visualized by R (Figure 2).

Discussion
Low-dose CT-based screening for lung cancer is likely to be introduced in in the near future, resulting in an increased workload for hospital-bas A multiple logistic regression model showed a statistically significant diagnostic impact of methylated HOXA9 when adjusting for age and smoking status (Table 4). Further information about the models can be found in the Supplementary Materials (Table S5). Table 4. Multiple logistic regression model developed on data from the Discovery cohort and subsequently applied to data from the Validation cohort. The model included data on 101 and 95 participants, respectively, from the Discovery and Validation cohorts. * Statistically significant impact.

Discussion
Low-dose CT-based screening for lung cancer is likely to be introduced in Denmark in the near future, resulting in an increased workload for hospital-based health professionals and many false positive or indeterminate findings. Biomarkers could be used to improve risk assessment before CT-based screening [12]. In the present biomarker validation study, we found that detectable methylated HOXA9 in bronchial lavage fluid had a clinically relevant sensitivity and specificity. The biomarker continued to have diagnostic impact after adjusting for age and smoking status.
These results are in line with findings by Roncarati et al. [17], who reported a sensitivity and specificity of 97% and 74%, respectively, for a four-gene biomarker panel analyzed on bronchial washings. That study and the present study are similar regarding clinicopathological features and methylation analysis method. The higher sensitivity reported by Roncarati et al. could be due to the four-gene panel, as additional markers have previously shown to increase sensitivity, albeit at the cost of specificity [30]. Roncarati et al. used the cell pellets from bronchial washings for DNA purification, while we used the supernatant. This could also cause differences in diagnostic impact.
In contrast, Villalba et al. [31] reported a sensitivity and specificity of 52% and 91%, respectively, for hypomethylation of transmembrane serine protease 4 (TMPRSS4) in bronchoalveolar lavage fluid from stages I-II non-small cell lung cancers. They observed no significant differences when considering all stages. This likely reflects the difference between the genes investigated, as the studies were similar in most other regards. TMPRSS4 encodes a membrane-bound serine protease with an unknown function [32], while HOXA9 encodes a transcription factor. These genes may differ with respect to when and how aberrant methylation develops during oncogenic transformation. Such potential differences are likely to explain the discrepancy between the studies.
We found that methylated HOXA9 status had a moderate sensitivity and a high specificity in the Discovery cohort, while this was somewhat reversed in the Validation cohort. Hence, the Discovery cohort had more false negative results, and the Validation cohort had more false positive results. Peripheral tumors are more difficult to visualize with a bronchoscope, and the bronchial lavage may be performed some distance from the tumor. This could result in false negative test results. The Validation cohort had a false positive rate of more than 20%. False positive results were evenly distributed among patients with cryptogenic organizing pneumonia, granulomatous inflammation and acute inflammatory disease. They could represent patients with pre-malignant lesions or lesions which would spontaneously resolve over time. A study by Wong Doo et al. [33] suggested that aberrant DNA methylation in blood is present years before a diagnosis of mature B-cell neoplasm is confirmed. This supports the idea that hypermethylation of HOXA9 is an early event which may or may not lead to malignant transformation. Closer monitoring could be a possible implication of detecting hypermethylated HOXA9 in bronchial lavage from a patient during diagnostic workup. HOXA9 has previously been shown to be downregulated in response to inflammatory signals [34], which may explain the aberrant hypermethylation in some of our patients with inflammatory disease. Finally, the two cohorts had different exclusion criteria regarding previous lung cancer. This may also explain some of the observed differences, although we would have expected more false positive results in the Discovery cohort, which allowed enrolment of participants with a previous lung cancer.
The implementation of CT-based screening on a high-risk population could lead to an increase in redundant, and possibly dangerous, invasive diagnostic procedures. The NLST reported a rate of 1.4% for at least one complication after diagnostic work-up in the CT group [5]. The NELSON trial reported no adverse events at all [6]. According to the NELSON trial, they were restrictive in referring patients for invasive diagnostic work-up, which may have contributed to the low complication rate. However, a meta-analysis of transthoracic biopsy from 2017 found a risk of pneumothorax of 25.3% and 18.8% for core biopsy and fine needle aspiration, respectively [35]. The major complication rates for these biopsy modalities were, respectively, 5.7% and 4.4%. An increase in the number of patients referred for diagnostic work-up would likely increase the number of patients eligible for biopsy-whether transthoracic or by bronchoscopy. In this respect, analyzing bronchial lavage fluid for tumor DNA could be used as a supplementary tool to identify patients with lung cancer in cases when biopsy would entail considerable risk of complications. Blood-based methylation markers would be easier and less invasive to collect, but they may not be as organ specific since many genes are aberrantly methylated in numerous tumors.
The main limitation of the present study was the modest cohort size. A larger cohort is required to be able to include all relevant clinical variables in the prediction model. There were different event rates in the two cohorts, and more lung cancer patients in the Discovery cohort. The participants were recruited consecutively; however, participants with a previous lung cancer diagnosis were not included in the Validation cohort. This was because of the potential risk of increasing the number of false positive results among these participants. The methylated HOXA9 cut-off and the multiple logistic regression model were developed on data from the Discovery cohort. This could explain some of the differences in the diagnostic properties observed between the two cohorts. We did not use a structured questionnaire or interview guide for registering participant characteristics but relied on the information obtained by the doctor in the medical record. A structured approach would have generated more reliable data. Analysis of tumor DNA in bronchial lavage is a relatively new approach in the diagnosis of lung cancer, and there is no consensus on the best material to use. Sputum [15], pleural effusion [30], bronchoalveolar lavage fluid [31] and bronchial lavage/bronchial wash fluid [17] have all been suggested. We chose to analyze bronchial lavage using only the supernatant, because we aimed to detect cell-free tumor DNA. In future studies it would be relevant to compare the DNA yield and diagnostic accuracy between pellet and supernatant methods.

Conclusions
In conclusion, we find that methylated HOXA9 in bronchial lavage holds potential as a supplementary tool in the diagnosis of lung cancer because it has a clinically relevant sensitivity and specificity. Methylated HOXA9 remained significant when adjusting for age and smoking status in a predictive model. Routine clinical application awaits further validation in a clinical trial.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10. 3390/cancers13164223/s1, Figure S1: Methylated HOXA9 in bronchial lavage fluid from participants examined for lung cancer, Table S1: Participants excluded because they had cancers other than lung cancer, Table S2: Diagnostic measures, Table S3: Univariate logistic regression analyses, Table S4: Biomarker multiple logistic regression model for prediction of lung cancer, Table S5: Clinical multiple logistic regression model for prediction of lung cancer. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to ethical considerations.