Salivary MicroRNA Signature for Diagnosis of Endometriosis

Background: Endometriosis diagnosis constitutes a considerable economic burden for the healthcare system with diagnostic tools often inconclusive with insufficient accuracy. We sought to analyze the human miRNAome to define a saliva-based diagnostic miRNA signature for endometriosis. Methods: We performed a prospective ENDO-miRNA study involving 200 saliva samples obtained from 200 women with chronic pelvic pain suggestive of endometriosis collected between January and June 2021. The study consisted of two parts: (i) identification of a biomarker based on genome-wide miRNA expression profiling by small RNA sequencing using next-generation sequencing (NGS) and (ii) development of a saliva-based miRNA diagnostic signature according to expression and accuracy profiling using a Random Forest algorithm. Results: Among the 200 patients, 76.5% (n = 153) were diagnosed with endometriosis and 23.5% (n = 47) without (controls). Small RNA-seq of 200 saliva samples yielded ~4642 M raw sequencing reads (from ~13.7 M to ~39.3 M reads/sample). Quantification of the filtered reads and identification of known miRNAs yielded ~190 M sequences that were mapped to 2561 known miRNAs. Of the 2561 known miRNAs, the feature selection with Random Forest algorithm generated after internally cross validation a saliva signature of endometriosis composed of 109 miRNAs. The respective sensitivity, specificity, and AUC for the diagnostic miRNA signature were 96.7%, 100%, and 98.3%. Conclusions: The ENDO-miRNA study is the first prospective study to report a saliva-based diagnostic miRNA signature for endometriosis. This could contribute to improving early diagnosis by means of a non-invasive tool easily available in any healthcare system.


Introduction
Endometriosis, defined by the presence of endometrium-like tissue outside the uterus, affects 2-10% of the female population, i.e., around 190 million women worldwide. It is a heterogenous disease with a poorly understood natural history and as such poses many challenges [1,2]. The first is timely diagnosis mainly because the symptoms of endometriosis are non-specific, and clinical examination is often either negative or results in a wrong diagnosis [3,4]. The second challenge is that complementary explorations, especially biomarkers [5,6] and imaging examinations [5][6][7][8], are often inconclusive and fail to diagnose early-stage endometriosis with sufficient accuracy or are of limited relevance for the severe forms. Consequently, therapeutic and follow-up strategies are compromised, and there is a high rate of conventional treatment failure [2,4]. Finally, endometriosis constitutes a considerable economic burden for the healthcare system linked not only to direct costs but also to indirect costs from school and work absenteeism. Overall, the annual cost of endometriosis was estimated at around 10,000 euros per patient in 2012, which is equivalent to that of diabetes in France in 2017 [9][10][11][12][13].
During the last decade, new diagnostic tools have been investigated to detect this debilitating disorder as early as possible [5,[14][15][16][17][18]. Among these, microRNA (miRNA) analysis is emerging as a promising option supported by a growing body of evidence from studies in cancer and degenerative disorders [19][20][21][22][23]. Human miRNAs are single stranded, highly conserved, non-coding RNAs composed of 21-25 nucleotides. Partial binding to their complementary messenger RNA (mRNA) can regulate gene degradation and translation. It is estimated that about 60% of genes are regulated by miRNAs [24]. From a biological point of view, miRNAs are mainly transcribed from genes in intronic regions of coding or non-coding transcripts [24][25][26]. miRNAs are transcribed in the nucleus under hundreds of duplex nucleotide-long primary miRNAs (pri-miRNA) subsequently cleaved to generate precursor miRNAs (pre-miRNA). These pre-miRNAs are then transported from the nucleus to the cytoplasm, where the duplexes are cleaved to form mature miRNAs, which are incorporated into RNA silencing complexes (RISC) that regulate posttranslational modifications by binding to the target mRNA [24][25][26]. Finally, the miRNAs are released from the cells into the circulation using various carriers, such as Argonaute, nucleophosmin 1, high-density lipoproteins, or extracellular vesicles (exosomes), which confer remarkable stability against endogenous RNAses. The miRNAs can then be detected in human fluids [24][25][26].
In the specific setting of endometriosis, several authors have evaluated the relevance of a blood-based miRNA signature, but the results are discordant because of methodological and control group issues [19,20,22,23,27,28]. Indeed, in previous studies, the control groups was composed of asymptomatic patients and/or patients undergoing a tubal ligation and/or pelvic inflammatory disease and/or gynecologic disorders but without knowledge of symptoms [19,20,22,23].
Similarly, saliva miRNAome analysis has been investigated by several teams exploring biomarkers for numerous benign and malignant disorders but never in the context of endometriosis [29][30][31][32].
Therefore, the aim of the prospective ENDO-miRNA study was to analyze the human miRNome to differentiate patients with and without endometriosis, to define a saliva-based diagnostic miRNA signature for endometriosis with an internal cross validation.

Ethics Statement
The data and saliva used for analysis were collected from the prospective ENDO-miRNA study (ClinicalTrials.gov Identifier: NCT04728152) under the Research Protocol ID RCB: 2020-A03297-32. Informed consent was obtained from all the participants. The study and data analysis followed the Standards for Reporting of Diagnostic Accuracy (STARD) reporting guidelines [33] (Annex S1). The study consisted of two parts: (i) identification of a biomarker based on genome-wide miRNA expression profiling by small RNA sequencing using next-generation sequencing (NGS) and (ii) development of a saliva-based miRNA diagnostic signature according to expression and accuracy profiling using an ML algorithm [17,21,[34][35][36][37][38][39][40][41][42].

Study Population
The prospective ENDO-miRNA study included 200 saliva samples obtained from women with chronic pelvic pain suggestive of endometriosis. All the saliva samples were collected between January and June 2021. All the patients underwent either a laparoscopic procedure (therapeutic or diagnostic laparoscopy) and/or MRI imaging [5][6][7][8]. The laparoscopic procedures were systematically videoed and then analyzed by two operators (C.T., Y.D.), blinded to the symptoms and imaging findings, to confirm the presence or absence of endometriosis. For the patients who underwent laparoscopy, diagnosis was confirmed by histology. For the patients diagnosed with endometriosis without laparoscopic evaluation, all had MRI with features of deep endometriosis with colorectal involvement and/or endometriomas confirmed by two expert radiologists. The miRNAs were analyzed blinded to the surgical and imaging findings. Following exploration by laparoscopy or MRI, the women were classified into two groups: an endometriosis group and a control group of women with various benign pathologies other than endometriosis or with symptoms suggestive of endometriosis but without clinical or MRI features and no endometriosis lesions found during laparoscopic inspection (complex patient). The study flow chart is reported in Figure 1. The patients with endometriosis were stratified according to the revised American Society of Reproductive Medicine (rASRM) classification [43].

Saliva Sample Collection
The saliva samples (2 mL) were collected in an all-in-one system including a nucleic acid stabilizing solution for the collection, stabilization, and transportation (RE-100, DNA Genotek Inc, 2 Beaverbrook Road Ottawa, ON, Canada) using an at-home kit (https://www.dnagenotek.com/us/products/collection-infectious-disease/OME-505.html, accessed on 1 December 2021). All the samples were stored at room temperature prior to shipping.

RNA Sample Extraction, Preparation and Quality Control
The RNA was isolated from each saliva sample using the miRNeasy Kit (Qiagen, Inc, Germantown, MD, USA) according to the manufacturer's instructions [29,31,44,45]. RNA quality was assessed using the Agilent Technologies Tapestation 2200. RNA-sequencing

Saliva Sample Collection
The saliva samples (2 mL) were collected in an all-in-one system including a nucleic acid stabilizing solution for the collection, stabilization, and transportation (RE-100, DNA Genotek Inc., 2 Beaverbrook Road, Ottawa, ON, Canada) using an at-home kit (https://www. dnagenotek.com/us/products/collection-infectious-disease/OME-505.html, accessed on 1 December 2021). All the samples were stored at room temperature prior to shipping.

RNA Sample Extraction, Preparation and Quality Control
The RNA was isolated from each saliva sample using the miRNeasy Kit (Qiagen, Inc., Germantown, MD, USA) according to the manufacturer's instructions [29,31,44,45]. RNA quality was assessed using the Agilent Technologies Tapestation 2200. RNA-sequencing libraries were prepared using the QIAseq miRNA Library Kit (Qiagen) according to the manufacturer's instructions. Samples were indexed in batches of 96, with a targeted sequencing depth of 17 million reads per sample. Sequencing was performed using 100 base single-end reads, using an Novaseq6000 sequencer (Illumina, San Diego, CA, USA) [46,47].

Differential Expression Analysis of miRNAs
miRNA expression was quantified by miRDeep2 v0.1.0 [51]. Differential expression tests were then conducted in DESeq2 for miRNAs with read counts in ≥1 of the samples. DESeq2 V1.20 integrates methodological advances with several novel features to facilitate a more quantitative analysis of comparative RNA-seq data using shrinkage estimators for dispersion and fold change [52]. The resulting matrix was filtered for expressed miRNAs [53]. The miRNAs were considered as differentially expressed if the absolute value of log2 fold change was >1.5 (upregulated) and <0.5 (downregulated). The p-value adjusted for multiple testing was <0.05 [52]. The Annex S2 summarized and adapted the miRNA-nome Sequencing Analysis Pipeline used from the methods of Potla et al. [44].

Development and Validation of the Diagnostic Model
Random Forest (RF) was considered to design the saliva signature [34][35][36][37][54][55][56]. Random Forest (RF) classifier is an ensemble method that trains several Decision Tree (DT) in parallel with bootstrapping followed by aggregation, jointly referred to as bagging. Bootstrapping indicates that several individual DTs are trained in parallel on various subsets of a training dataset using different subsets of available features. Bootstrapping ensures that each individual DT in the RF is unique, which reduces the overall variance of the RF classifier. For the final decision, RF classifier aggregates the decisions of individual DTs and consequently exhibits good generalization [54]. F1-score, sensitivity, specificity, and the ROC AUC were calculated to assess and compare the diagnostic performance of the diagnostic signature [57,58].

Validation of the Signature Accuracy
The accuracy and reproducibility of the signature were tested on 10 data sets randomly [41,59,60] composed of the same proportion of control and endometriosis patients. Each data set was randomly generated to conserve the initial ratio of endometriosis and control patient's profile. Analysis was performed using Python (Python Software Foundation) with XGBoost 1.3.3, scikit-learn 0.19.1, and scipy 1.1 packages.

Other Statistical Analyses
Statistical analysis was based on the chi-square test as appropriate for categorical variables. Values of p < 0.05 were considered to denote significant differences. Data were managed with an Excel database (Microsoft, Redmond, WA, USA) and analyzed using R 2.15 software, available online (http://cran.r-project.org/, accessed on 1 December 2021).

Description of the ENDO-miRNA Cohort
The clinical characteristics of the patients in the endometriosis and control groups are presented in Table 1. Among the 200 patients, 76.5% (n = 153) were diagnosed with endometriosis and 23.5% (n = 47) without (controls). In the control group, 51% (24) of the women had no abnormality and were defined as discordant or complex patients (Table 1). There were no significant differences in terms of age or BMI between the groups. The mean (±SD) time from symptom onset to diagnosis for endometriosis patients was 14.8 years (±17.88). In both groups, the patients had pain symptoms suggestive of endometriosis. Comparatively, for patients with and without endometriosis using Visual Analogical Scale (VAS), the dysmenorrhea/of dysmenorrhea (mean ± SD) were 6 ± 3.4 versus 5 ± 3.2, p < 0.001; dyspareunia was 5.28 ± 3.95 verus 4.95 ± 3.52, p < 0.001; and urinary pain during menstruation (mean ± SD) were 4.35 ±3.36 versus 2.84 ±2.76, p < 0.001. For the endometriosis patients, 52% (80) had rASRM stage I-II, and 48% (73) had stage III-IV.

Global Overview of miRNA Transcriptome
Small RNA-seq of 200 saliva samples yielded~4.642 M raw sequencing reads (from 13.7 M to~39.3 M reads/sample). Pre-filtering and filtering steps retained 70% (~3.205 M) of initial raw reads. The majority of the filtered reads were of short read length. Quantification of the filtered reads and identification of known miRNAs yielded~190 M sequences that were mapped to 2561 known miRNAs from miRBase v21. The number of expressed miRNAs ranged from 1250 (outlier) to 2561 per sample. The distribution of expressed miRNAs in the 200 saliva samples and the overall composition of processed reads is shown in Figure 2.

RNA reads = miRNAs + piRNAs + rRNAs + tRNAs + mRNAs + others
Filtered Reads = reads with no adapters + reads with low quality bases + reads too short Not Characterized / Mappable reads = mapped reads to GRCh38 that could not be characterized as a particular type Not Characterized / Not Mappable reads = reads that could not be mapped Overall composition of processed reads. RNA reads, miRNAs + piRNAs + rRNAs + tRNAs + mRNAs + others; Filtered Reads, reads with no adapters + reads with low quality bases + reads too short; Not Characterized/Mappable reads, mapped reads to GRCh38 that could not be characterized as a particular type; Not Characterized/Not Mappable reads, reads that could not be mapped.

Saliva-Based Diagnostic Signature for Endometriosis
The overall performance of the diagnostic signature composed of 109 mi RNAs (Random Forest model) against the 10 randomized datasets is reported in Table 2. The sensitivity, specificity, and AUC ranges from 80% to 96.8%, 80% to 100%, and 79.9% to 98.4%, respectively. The signature, after internal cross validation on 10 different data sets, obtained its higher accuracy with a respective sensitivity, specificity, and AUC of 96.7%, 100% and 98.3% ( Table 2).

Relation between Pathophysiology of Endometriosis and miRNA Expression
Among the 109 miRNAs composing the endometriosis diagnostic signature, 77% (84) have been reported to be associated with pathophysiologic pathways for benign and malignant disorders (Annex S3). Only miR-34c-5p and miR-19b-1-5p have previously been reported in the field of endometriosis. Among the 109 mi RNA of the signature, 29 (27%) are associated with the main signaling pathways of endometriosis: PI3K/Akt, PTEN, Wnt/β-catenin, HIF1α/NF κB, and YAP/TAZ/EGFR (Annex S3).

Discussion
To the best of our knowledge, the ENDO-miRNA study is the first prospective study to report a saliva-based diagnostic miRNA signature for endometriosis. This could contribute to improving early diagnosis by means of a non-invasive tool easily available in any healthcare system. Its value lies in the combination of the intrinsic quality of miRNA to condense endometriosis phenotypes (and its heterogeneity) and the modeling power of AI. Its reproducibility is based on our bioinformatics approach of miRNA-sequencing analysis and a statistical approach designed to overcome the complexity and heterogeneity of endometriosis.
We hypothesized that a saliva-based miRNA signature for endometriosis would be a low-cost and scalable method allowing samples to be collected anywhere by anyone. The tool would thus be available for underprivileged populations unlike methods based on blood samples, which are blood-volume and temperature dependent, imposing complex logistics of collecting peripheral blood and transporting it to a laboratory for analysis.
Saliva is an increasingly attractive body fluid in the search for disease biomarkers [29][30][31][32]45,61]. miRNAs exhibit remarkable stability in severe conditions, such as extended storage [24,29,32,62]. Zheng et al. demonstrated that saliva is not affected by coagulation that could induce a release of miRNAs. This is especially crucial as many studies evaluating miRNA expression in endometriosis have been performed on serum [23,61]. Zhang et al. first developed a technique to stabilize the saliva and process RNA analysis. The average range of RNA content in 1 L of bodily fluids is as low as 0.01 mg in urine to as high as 11.2 mg in saliva [63]. Moreover, blood, leukocytes, and saliva have lower standard deviations in their RNA content (<50% on average) compared with serum and urine. The average concentration of the isolated RNAs from all bodily fluids can be classified into high (>20 ng/µL) for blood, leukocytes, saliva, and cell-free saliva and low (<10 ng/µL) for plasma, serum, urine, and cell-free urine. Finally, more than 90% of miRNAs in saliva are shared with blood, leukocytes, and plasma, which further supports its stability and reproducibility [63].
In the specific setting of endometriosis, and despite the various endometriosis phenotypes, we were able to build an endometriosis diagnostic signature. The most accurate signature in our model provides a sensitivity, specificity and AUC of 96.7%, 100% and 98.3%, respectively. These values testify to the high accuracy of the signature, supporting its clinical value, and raise the issue to revise the current diagnostic strategy for exploring patients with symptoms suggestive of endometriosis, based on a diagnostic laparoscopy.
Multiple diagnostic biomarkers have been suggested as screening and triage tests to diagnose endometriosis [14,15], but none of them are of sufficient accuracy, i.e., a sensitivity of 0.94 and a specificity of 0.79 [5,14,15]. In accordance with the 2011 Biomarkers Definitions Working Group, a biomarker is "a characteristic that can be objectively measured and evaluated as an indicator of normal biological or pathogenic processes, or as an indicator of pharmacologic response to therapeutic interventions" [64]. The present study was able to quantify and analyze the miRNAome for (i) discordant/complex patients (women with chronic pelvic pain suggestive of endometriosis and both negative clinical examination and imaging findings), (ii) women with early-stage (stage I-II rASRM) and advanced-stage (stage III-IV rASRM) endometriosis, and (iii) women with other gynecological disorders sharing symptoms of endometriosis. To subscribe to biomarker criteria, we hypothesized the relevance of the exhaustive evaluation of all miRNAs associated with endometriosis for 200 saliva samples to unearth the complexity of the disease and its heterogeneity. To our knowledge, this is the first exhaustive sequencing of the human saliva miRNAome in the specific context of endometriosis, and we show that 97.3% of all miRNAs are detectable in the saliva with a homogeneous stability of reads. Our analysis resulted in the selection of a set of 109 miRNAs robustly tested.
Several studies have reported aberrant expression of miRNAs in affected tissues or peripheral blood samples of patients with endometriosis [20,22,23,[62][63][64]. Several miR-NAs have been shown to be dysregulated during the pathogenic process of endometriosis [20,22,23]. Diagnostic power of several miRNAs has been assessed in endometriosis [19,20,22,23,65]. For example, Maged et al. have shown that serum miR-122 and miR-199a had a sensitivity of 95.6 and 100.0% and a specificity of 91.4 and 100%, respectively, for diagnosis of disease status in women. Thus, these miRNAs are putative serum biomarkers for endometriosis [66]. To date, Moustafa et al. [65] is the only team to have attempted to build a blood-based miRNA diagnostic signature for endometriosis composed of six miRNAs based on Random Forest analysis. In agreement with previous studies [40][41][42], it would appear illusory that so few miRNAs could reflect the diversity of a multifactorial disorder such as endometriosis, which involves multiple and poorly known signaling pathways. Therefore, we hypothesized the value of (i) analyzing a specific selection of miRNAs, which resulted in a selection of 109; (ii) reducing the number of features to improve the final accuracy; and finally, (iii) using Random Forest model with high accuracy, which supports the value of AI technology. Such an approach has previously been validated in a study showing that a 100-miRNA blood signature was sufficiently stable to provide almost the same classification accuracy across different types of cancers and platforms [40,41]. Previous studies have demonstrated that saliva miRNA expression analysis can differentiate Crohn's disease from ulcerative colitis and are of value in head and neck, pancreatic-biliary tract, and oral cancers but failed to report a true diagnostic signature [29,31]. Nevertheless, Rapado-González et al. reported a salivary miRNA signature composed of 22 miRNAs in colorectal cancer patients vs. healthy individuals in the discovery phase [30]. Moreover, Cheng et al. demonstrated the relevance of miRNA saliva expression to diagnose Yang and Yin Deficiency with a panel of 81 and 96 miRNAs, respectively [67].
From a pathophysiologic point of view, after systematic review, among the 109 miR-NAs of our endometriosis signature, only four miRNAs (miR-34c-5p, miR-19b-1-5p, miR-149-5p, and miR-378a-3p) have previously been reported in patients with endometriosis; 25 have not been reported previously, implying that further studies should be conducted to confirm their involvement in the pathophysiology of endometriosis; and the remaining 80 are known to be involved in various signaling pathways, such as PI3K/Akt, PTEN, Wnt/βcatenin, HIF1α/NF κB, and YAP/TAZ/EGFR, with potential therapeutic implications.
Some limitations of the present study deserve to be discussed. First, although 97.3% of the human miRNAome were detectable and analyzed in the saliva samples, we cannot rule out that the remaining 2.7% of miRNAs are not involved in endometriosis. Second, as for blood sampling, a potential bias could be linked to the use of hormonal treatment: some patients in both the endometriosis and control group had undergone prior hormonal treatment that might have affected miRNA expression. However, previous studies [19,65] have reported that no significant miRNA changes are observed either during the menstrual cycle or in response to sex steroid hormone therapies [19]. Finally, as previously mentioned for the blood-based miRNA signature, another potential bias could be related to the inclusion of patients with deep endometriosis and/or endometrioma without laparoscopic control in the endometriosis group. However, previous studies have demonstrated the high accuracy of MRI to diagnose endometrioma and deep endometriosis with colorectal involvement, reaching the criteria for a replacement and SnNout triage test [5,7]. Although, our prospective study is the largest available on miRNA and saliva (n = 200) [61], the sample size, especially concerning the control group (n = 47), and the internal cross validation warrants an external validation. Our signature exhibits higher accuracy in patients over 18 years old, and our population did not include adolescents. Therefore, it is not possible to extrapolate our results in this specific population, impending further studies.

Conclusions and Perspectives
Despite some limits of the current prospective study, our data support the use of a saliva-based diagnostic miRNA signature for endometriosis in the diagnosis care pathways after an external validation to confirm these results. Saliva sampling is a cheap and noninvasive process and can be repeated multiple times, thus potentially improving both the diagnostic and therapeutic management of patients through early identification and for all populations. Finally, beyond the context of endometriosis, our methodology could be used as a blueprint to investigate other pathologies, both benign and malignant.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/jcm11030612/s1. Annex S1: STARD 2015: Stands for "Standards for Reporting Diagnostic accuracy studies; Annex S2: MIRNome Sequencing Analysis Pipeline adaptation from Potla et al. [43]; Annex S3: mi RNA signature accuracy, pathophysiology and signaling pathways. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Conflicts of Interest:
The authors declare no conflict of interest.