Capturing the Kidney Transcriptome by Urinary Extracellular Vesicles—From Pre-Analytical Obstacles to Biomarker Research

Urinary extracellular vesicles (uEV) hold non-invasive RNA biomarkers for genitourinary tract diseases. However, missing knowledge about reference genes and effects of preanalytical choices hinder biomarker studies. We aimed to assess how preanalytical variables (urine storage temperature, isolation workflow) affect diabetic kidney disease (DKD)—linked miRNAs or kidney—linked miRNAs and mRNAs (kidney-RNAs) in uEV isolates and to discover stable reference mRNAs across diverse uEV datasets. We studied nine raw and normalized sequencing datasets including healthy controls and individuals with prostate cancer or type 1 diabetes with or without albuminuria. We focused on kidney-RNAs reviewing literature for DKD-linked miRNAs from kidney tissue, cell culture and uEV/urine experiments. RNAs were analyzed by expression heatmaps, hierarchical clustering and selecting stable mRNAs with normalized counts (>200) and minimal coefficient of variation. Kidney-RNAs were decreased after urine storage at −20 °C vs. −80 °C. Isolation workflows captured kidney-RNAs with different efficiencies. Ultracentrifugation captured DKD -linked miRNAs that separated healthy and diabetic macroalbuminuria groups. Eleven mRNAs were stably expressed across the datasets. Hence, pre-analytical choices had variable effects on kidney-RNAs—analyzing kidney-RNAs complemented global correlation, which could fade differences in some relevant RNAs. Replicating prior DKD-marker results and discovery of candidate reference mRNAs encourages further uEV biomarker studies.

Currently, effort is put on research and to set forth recommendations for uEV work, e.g., in sample handling, storage, uEV isolation and reporting [12][13][14][15][16][17][18]. This is highly important because there are vast differences in the pre-analytical, analytical and reporting procedures. For example, a recent survey by the Spanish Society for Research and Innovation (Spain) in Extracellular Vesicles (GEIVEX) found that the variability of preanalytical procedures can be as high as 94% [18]. Without some level of standardization, the biomarker discovery results are seldom highly robust or reproducible [19].
More specifically, one of the most pressing problem in the preanalytical part is that many collections in laboratories and biobanks may not be handled and stored optimally for uEV research. Moreover, only few studies have characterized the effect of pre-analytical variables on the uEV, especially regarding the end-point biomolecular level used in biomarker studies, e.g., the transcriptome [14]. Equally, only a few studies have comprehensively characterized the effect of EV isolation methods on transcriptomics [13,15,16,[20][21][22]. Thus, it is difficult to compare results between dissimilar studies.
Urinary EV capture kidney transcriptome and proteome ( [7,9,[23][24][25][26]. With focus on uEV RNAs, we and others have shown that uEV can capture the dysregulation of RNAs associated with pathological mechanisms of DKD such as oxidative stress [9], fibrosis [27], and inflammation [28]. We have previously shown by RNA sequencing technologies that some preanalytical variables such as urine storage temperature and isolation methods affect the uEV RNA yield and global miRNA and mRNA profiles [9,14]. However, for kidney research, it would be important to understand how exactly the pre-analytical choices affect the uEV as a "liquid kidney biopsy". Are all the uEV miRNAs and mRNAs-highly or specifically expressed by the kidney and from different disease mechanism pathwaysaffected by the different preanalytical variables and to which extent? Are the kidney derived RNAs for example missing completely or just downregulated and therefore still available as biomarkers?
Urinary EV reference genes represent another unmet need in the EV field. Both research on EV reference genes and recommendations on how to select the reference genes are increasing [29,30]. However, only a moderate number of sequencing datasets are currently available for rigorous search of robust reference genes that would be stable across studies, at least for uEV. Again, the effect of preanalytical variables, or demographic and disease status variables, on the stability of reference genes is not clear. This represents a problem for qPCR validation experiments. GAPDH is commonly utilized to normalize gene expression but does not work equally fine for all tissues, biofluids, or disease status [31][32][33].
In conclusion, it is unclear how candidate markers reported by different studies could be replicated under different experimental conditions.
In this study, we assessed the effect of storage temperature and uEV isolation workflows on uEV transcriptome by focusing on highly expressed miRNAs and enriched genes of the kidney. We assessed the replicability of previously described candidate miRNA markers of DKD and explored the existence of reference genes across diverse uEV sequencing datasets.

miRNA and mRNA Sequencing Datasets
The datasets included in this study were retrieved from previous publications from our group describing the pre-analytical and analytical parts including quality control in detail [6,9,13,14]. Details for each dataset are included in Table 1. For the storage temperature study, urine samples were divided in two aliquots on the collection day, and they were stored at −20 • C or −80 • C for 13-16 months. Importantly, temperature sample pairs were always stored for equal times i.e., the isolation of extracellular vesicles was done the same day. Similarly, there were no differences in the storage time for isolation workflow, overnight (ON)/24 h collections (24 h), and pre-clearing studies between the sample pairs. Of note, except for the isolation workflow dataset, the rest of the samples were processed by ultracentrifugation.

Kidney Top Expressed miRNAs and Kidney Enriched mRNAs in uEV
Kidney enriched genes ("At least four-fold higher mRNA level in kidney compared to the average level in all other tissues") were retrieved from The Human Protein Atlas, v20 [34,35] (www.proteinatlas.org) (accessed on 19 November 2020). For miRNAs, we used top kidney expressed miRNAs (40 miRNAs with highest expression in the kidney) which were retrieved from miRNATissueAtlas2 [36] (https://ccb-web.cs.uni-saarland.de/ tissueatlas2) (accessed on 17 June 2022). For these analysis, raw sequencing counts were normalized as described in the original publications by using TMM (trimmed mean of M values) [37] in edgeR [38] or DEseq2 normalization [39,40].

Literature Review of miRNAs Associated with DKD
We did a literature review of miRNAs associated with DKD based on evidence from tissue (human or animal models) or in vitro models and for miRNAs based on evidence from human urine, urinary sediments or uEV (differential expression padj < 0.05). For the latter, some studies reported miRNAs with nominal p-values, and in such cases we included only the miRNAs that had been also validated with another quantification method or by using in-vitro or in-vivo models. To search the DKD associated miRNAs in our uEV dataset we used the reported identifiers i.e., if the literature only provided the stem identifier, we searched the immature miRNA in our dataset and not the mature miRNAs (-3p/-5p).

Stable mRNAs across Datasets
All datasets were normalized using TMM normalization. Of note, samples from overnight and 24 h collections, with and without pre-clearing and technical replicas were normalized together and we refer to this dataset as "technical dataset". Genes with normalized counts of CPM > 200 in all samples were filtered to calculate the coefficient of variation (CV). The top 100 genes with the lowest CVs were selected from each experimental dataset and the gene lists were compared to identify shared genes across datasets. To assess the stable genes functions we used gene cards [41] (www.genecards.org) (accessed on 27 June 2023) and to assess to which pathways the stable genes contribute to, we used Uniprot knowledge base (UniProt Consortium 2023) (https://www.uniprot.org/) (accessed on 28 April 2023). Protein interaction was assessed using STRING V11.5 [42] (https://string-db.org/) (accessed on 28 April 2023).

Data Visualization
For data visualization, built-in R functions or packages ggplot2 [43], pheatmap [44], and reshape2 [45], were used. Values are represented as mean ± SEM (standard error of the mean). Figure panels were prepared using corelDRAW 2022 v24.1.0360 (Corel Corporation, Ottawa, ON, Canada). Some of the results presented here are part of Karina Barrreiro's dissertation which is accessible in the Digital Repository of the University of Helsinki (HELDA). Table 1. Datasets included in this study. Diabetic kidney disease (DKD), hydrostatic filtration dialysis (HFD), overnight (ON), prostate cancer (PCa), ultracentrifugation (UC), urinary extracellular vesicles (uEV), Urine Exosome Purification and RNA Isolation Midi Kit (NG). * NG was not included in the analysis due to the poor performance on RNA sequencing. ** 24 h urine collections were not pre-cleared and ON urine collections were pre-cleared.

Results
Our study focused on the kidney-linked and putative reference RNAs in uEV isolates targeting applicability for biomarker discovery. The uEV isolates used to generate the eleven sequencing datasets analyzed in this study were comprehensively characterized in our original publications (Table 1) by electron microscopy, Western blotting, and nanoparticle tracking, RNA fragment length and protein analysis. Briefly, this quality control indicated that the main population of uEV and RNA was small in size and length (<300 nm and <300 nt, respectively) and that the presence of e.g., remnant Tamm-Horsfall protein varied, but was not extensive.

Effect of PreAnalytical Variables on Kidney Transcriptome in uEV Isolates
In previous studies we determined that some preanalytical variables such as storage temperature affect the global uEV transcriptome [13,14]. As the uEV have shown potential as "liquid kidney biopsy" [9], we now assessed whether these preanalytical variables impact the kidney transcriptome in uEV isolates. Here we analyzed the expression level of "kidney-RNAs" i.e., top kidney expressed miRNAs and kidney enriched mRNAs.

Effect of Storage Temperature
To analyze the effect of urine storage temperature on miRNAs that have high expression in the kidney, we focused on the top 40 kidney expressed miRNAs. In our dataset (n = 8 samples), we found 29 out of the 40 miRNAs and for 22 of those, the normalized expression level was lower in urines stored at −20 • C than in urines stored at −80 • C ( Figure 1). Of note, two of the miRNAs were not detected at all in the −20 • C samples (Table S2).

Effect of PreAnalytical Variables on Kidney Transcriptome in uEV Isolates
In previous studies we determined that some preanalytical variables such as storage temperature affect the global uEV transcriptome [13,14]. As the uEV have shown potential as "liquid kidney biopsy" [9], we now assessed whether these preanalytical variables impact the kidney transcriptome in uEV isolates. Here we analyzed the expression level of "kidney-RNAs" i.e., top kidney expressed miRNAs and kidney enriched mRNAs.

Effect of Storage Temperature
To analyze the effect of urine storage temperature on miRNAs that have high expression in the kidney, we focused on the top 40 kidney expressed miRNAs. In our dataset (n = 8 samples), we found 29 out of the 40 miRNAs and for 22 of those, the normalized expression level was lower in urines stored at −20 °C than in urines stored at −80 °C ( Figure  1). Of note, two of the miRNAs were not detected at all in the −20 °C samples (Table S2). Out of 56 kidney enriched mRNAs we found 33 in our dataset. Analysis of the expression levels showed that 15 mRNAs were poorly represented in urines stored at −20 °C compared to the ones stored at −80 °C ( Figure 1A). Importantly, 10 of the mRNAs were not detected at all in the −20 °C samples (raw counts = 0) (Table S2). Out of 56 kidney enriched mRNAs we found 33 in our dataset. Analysis of the expression levels showed that 15 mRNAs were poorly represented in urines stored at −20 • C compared to the ones stored at −80 • C ( Figure 1A). Importantly, 10 of the mRNAs were not detected at all in the −20 • C samples (raw counts = 0) (Table S2).

Effect of Isolation Workflows
We next analyzed the effect of the EV isolation workflows on the uEV expression of kidney-RNAs. Out of 40 highly expressed miRNAs of the kidney, we found 36 in our datasets (n = 26 samples). All the miRNAs were stably expressed across the different isolation workflows but the expression of 18 miRNAs was lower in samples from HFD workflow (Figure 2A). We then analyzed differences in the normalized counts of these 18 miRNAs between HFD and UC samples (samples that showed low expression in HFD (4,5,6,8,9,10) and we observed that the normalized counts were systematically lower in HFD samples compared to UC, with differences ranging between 3-58%. MiRNAs with highest differences (>35%) were hsa-miR-101-3p, hsa-miR-26a-5p, hsa-miR-26b-5p, hsa-miR-27a-3p, hsa-miR-29c-3p. Regarding the kidney enriched genes, we found 31 out of the 56 and all of them had lower expression in samples from Norgen urine Exosome Purification and RNA Isolation Midi Kit (NG) ( Figure 2B). Five of the mRNA were not detected in any of the NG samples (raw counts = 0) and generally, many of the samples had raw count 0 (Table S3).  Both temperature and isolation workflow impacted the kidney transcriptome in uEV isolates and these differences are in some cases better captured by analyzing kidney-RNAs than by analyzing global expression.

Dysregulated miRNAs in Samples Stored at Suboptimal Temperature: Significance for Kidney Disease Biomarker Discovery
Previously, we reported different miRNA profiles from uEV isolated from urines stored at −20 • C vs. −80 • C [14] (from now on, for simplicity, we will refer to these samples as "urines stored at −20 • C or −80 • C"). Specifically, by differential expression analysis of normalized counts, we found 29 downregulated and 4 upregulated uEV miRNAs in urines stored at −20 • C compared to the ones stored at −80 • C. To assess the biological relevance of the dysregulated miRNAs, we performed a literature review and found that 25/33 miRNAs were associated with kidney diseases ( Table 2). In addition, a careful comparison of the raw and normalized counts revealed that most of the downregulated miRNAs in urines stored at −20 • C failed to be detected (raw counts = 0), while the 4 downregulated miRNAs in urines stored at −80 • C were stably expressed across samples and had high raw counts (Tables reftabref:genes-2445271-t002 and S1). Thus, in urines stored at −20 • C, a significant number of potential kidney disease markers were lost, and the upregulated genes' raw counts were actually lower than in urines stored at −20 • C. −80 • C samples. Table 2. Down-and up-regulated miRNAs in uEV derived from urines stored at −20 • C for up to 1 year. Acute kidney injury (AKI), chronic kidney disease (CKD), diabetic kidney disease (DKD), lipopolysaccharide (LPS), streptozotocin (STZ), urinary extracellular vesicle (uEV).
Upregulated in a mouse model of renal fibrosis [64].
Downregulation associated with podocyte injury induced by high glucose [65].
Upregulation increases fibrosis in mouse and in vitro [67].

Replication of DKD-Associated miRNA by UC-Based uEV Isolation and Sequencing Workflow
Prior research has reported many miRNAs that associate with DKD in T1D and/or T2D. Thus, we carried out a literature search to generate a list of these DKD -linked miRNAs (padj < 0.05 or p < 0.05 and other evidence of association, see methods) and used it for studying their expression in the UC-isolated uEV from DKD patients vs. heathy controls (n = 10 samples). We found (i) 107 miRNAs based on evidence from tissue (human or disease models) or in vitro models and (ii) 63 miRNAs based on evidence from human urine, urinary sediments or uEV (Tables 3 and 4). MiRNAs dysregulated in tissue or in vitro models were associated to previously described DKD pathways including inflammation, fibrosis, podocyte injury, and oxidative stress (Table 3). We found 12 miRNAs in common between miRNAs deregulated in tissue or in vitro and urine/urine sediments or uEV (highlighted in bold text in Table 4), namely hsa-miR-214-3p, hsa-miR-192, hsa-miR-200c, hsa-miR-15b-5p, hsa-miR-30c-5p, hsa-miR-30b-5p, hsa-miR-21-5p, hsa-miR-30e-5p, hsa-miR-200c-3p, hsa-miR-200a-3p, hsa-miR-155-5p and hsa-miR-29b-3p, which have been shown to modulate hypertrophy, fibrosis, inflammation, and apoptosis. Table 3. MiRNAs associated with DKD development and/or progression with evidence in kidney tissue and/or cell lines. Reported target genes for the dysregulated miRNAs have direct regulation evidence (e.g., luciferase reporter assay).

Sample Groups Upregulated miRNAs Downregulated miRNAs Reference
Urine * Urine and plasma from T1D and DKD miR-30e-5p [154] uEV * T2D DKD, T2D normal renal function, and non-T2D CKD miR-21-5p miR-30b-5p [28] Urine ** DKD and non-diabetic renal disease T2D vs. non-diabetic renal disease: miR-27-3p, miR-1228 [155] uEV * T2D and normoalbuminuria, microalbuminuria or macroalbuminuria and healthy donors miR-15b-5p [72] uEV * TD2 DKD and healthy donors miR-30e-3p, miR-30c-5p, miR-190a-5p, miR-98-3p, let-7a-3p, miR-30b-5p, and let-7f-1-3p [156] We analysed the expression levels of the miRNA from the literature review (i.e., Tables 2 and 3) in our uEV data (UC isolation workflow dataset) using expression heatmaps and checked whether the miRNAs could cluster the healthy control and macroalbuminuria groups separately by hierarchical clustering. Our uEV set showed expression of 39 out of 107 miRNAs (36%) dysregulated in DKD with evidence from tissue and in vitro studies, but they did not separate the groups ( Figure 3A). However, our uEV set showed expression of a higher proportion of miRNAs-31 out of 63 (49%)-that were dysregulated in DKD with evidence from urine, urine sediment or uEV. Importantly, this set of miRNAs could divide the DKD and healthy control groups into separate clusters and this was observed both by hierarchical clustering and principal component analysis ( Figure 3B,C). We focused on the miRNAs with the biggest fold changes that were located on the first and fourth (last) cluster of the heatmap in Figure 3B-they separated the groups by principal component analysis even more evidently than the 31 mRNAs ( Figure 3D). From those miRNAs, we compared the direction of change between the literature review and our dataset. For the first cluster, miR-30b-5p, miR-221-3p, let-7f-1-3p, and let-7a-3p followed the same direction of change in both i.e., downregulated in DKD. In contrast, miR-15b-5p was upregulated in the literature with evidence from uEV/urine or urine sediments but downregulated in our uEV dataset. For the fourth cluster, all miRNAs (miR-424-5p, miR-486-3p, miR-335-5p, miR-126-3p) had the same direction of change than what was found in the literature i.e., upregulated in DKD. Moreover, all the miRNAs had evidence of association with DKD in vitro or in vivo and/or association with DKD pathways (in kidney or other cells) ( Table 5).
To assess whether these 31 miRNAs would show some specificity for DKD, we carried a similar analysis using our uEV PCa dataset. Supporting DKD specificity, the analysis did not separate the PCa patients from healthy controls ( Figure 3E).
Taken together, despite variability between experimental setups, some of the uEV/urine/ urine sediment miRNAs presenting candidate markers associated with DKD in the literature were confirmed in our uEV dataset and expression level changes between experimental groups were concordant.

Exploratory Analysis of Reference mRNAs in uEV
To select the most stable mRNAs that could serve as candidate reference genes we first focused on uEV samples from our DKD studies that included men only. This choice was due to expected and higher sample-linked [9] and also biological heterogeneity in the women's cohorts. Datasets were analyzed separately to avoid batch effects i.e., isolation workflows (n = 20 samples), in-column DNAse treatment (n = 19 samples), technical dataset (type of collection, pre-clearing, and replicability, see methods) (n = 39 samples), and DKD cohort 1 (T1D, men) (n = 72 samples). Of note, NG isolation workflow data and storage temperature dataset were excluded from the analysis due to the low expression level of many mRNAs (for raw counts, see Table S3). The top 100 uEV genes with the lowest CV were selected from each dataset and the genes overlapping between all of them were selected for further analysis. We found 32 uEV genes in common between the datasets ( Figure 4A).
We next expanded our reference gene analysis to check the stability of expression including women's uEV samples. Here we searched genes in common between the DKD male (32 uEV stable mRNAs from first search) and DKD cohort 2 (T2D, women) (n = 30 samples) using again the top 100 uEV RNA with low CV (in cohort 2), which showed 18 mRNAs in common ( Figure 4B). Finally, we assessed whether some of these 18 mRNAs could also be found from the PCa dataset (n = 8 samples) listing the top 100 uEV mRNAs with low CV. This analysis showed 11 mRNAs in common (HSPD1, SRSF3, VAPA, RAB1A, MORF4L1, PGK1, RHOA, UBE2D3, DAZAP2, UBC, ACTG1) with low CV ( Figure 4C, Table 6).     We analyzed the counts per million (CPM) of the stable genes across samples. In addition, we included GAPDH, a commonly used reference gene, and a gene with high CV (UPK1A) for comparison (Table 6). CPM analysis showed that CPM variation of ACTG1 across samples was similar to the variation observed for GAPDH (both with high and comparable CPM, the rest of the stable mRNA had lower CPM than ACTG1 and GAPDH) but in both cases the variation was low compared to the gene with the highest CV (UPK1A) in all datasets ( Figures 5, 6, S1 and S2). For visualization of CPM values across samples, the candidate reference genes were sorted by decreasing standard deviation (SD) value. The 5 genes with the lowest SD value are plotted in Figures 5 and 6A-D and the remaining 6 genes are plotted in Figures S1 and S2A,B. We also summarized the data in boxplots to visualize the CPM dispersion per gene ( Figures 5E-H and 6C,D). Table 5. miRNAs dysregulated in uEV/urine/urinary sediments from individuals with DKD. Cluster 1 and 4 (see Figure 3B) miRNAs and association with diabetic kidney disease or diabetic kidney disease associated mechanisms in kidney and/or other cells. Acute kidney injury (AKI), diabetic kidney disease (DKD), type 2 diabetes (T2D), urinary extracellular vesicle (uEV).

Regulation in uEV/Urine/Urine Sediments Literature
Examples of Association with Diabetic Kidney Disease or Kidney Diseases, or Pathways Associated with DKD (e.g., Fibrosis, Inflammation, Autophagy, and Oxidative Stress) In hyperglycemic conditions, expression levels reduced in HK-2 cells and epithelial-mesenchymal transition was increased [99].

miR-221-3p down down
In HUVEC cells, hyperglycemia induced this miRNA and was associated with impairment of endothelial cell migration and homing [157].   It has been suggested that a combination of reference genes could provide a more reliable and accurate normalization approach compared to individual reference genes. For generating such a normalization factor, it is important that genes are not co-regulated. In order to spot genes that may be co-regulated, we examined the functions and associated biological processes of the stable genes. As shown in Table 7, the reference gene's functions (at the protein level) are varied including protein folding, glycolysis, signaling cascades, intracellular vesicular trafficking, and splicing. They also participate in several different prominent pathways. Of note, UBE and UBE2D3 both ubiquitylate proteins. Moreover, an analysis of protein interaction (based on experimental evidence from literature) using STRING showed interaction of UBC with UBE2D3 and DAZAP2 and of MORF4L1 with ACTG1 ( Figure 7). In addition, RHOA is involved in some biological processes shared with other stable genes i.e., with RAB1A (cell migration and substrate adhesion-dependent cell spreading), ACTG1 (response to mechanical stimulus and regulation of focal adhesion assembly), DAZAP (positive regulation of protein serine/threonine kinase activity), and VAPA (positive regulation of I-kappaB kinase/NF-kappaB signaling) ( Table 7). We analyzed the counts per million (CPM) of the stable genes across samples. In addition, we included GAPDH, a commonly used reference gene, and a gene with high CV (UPK1A) for comparison (Table 6). CPM analysis showed that CPM variation of ACTG1 across samples was similar to the variation observed for GAPDH (both with high and comparable CPM, the rest of the stable mRNA had lower CPM than ACTG1 and GAPDH) but in both cases the variation was low compared to the gene with the highest CV (UPK1A) in all datasets ( Figures 5, 6, S1 and S2). For visualization of CPM values across samples, the candidate reference genes were sorted by decreasing standard deviation (SD) value. The 5 genes with the lowest SD value are plotted in Figures 5 and 6A-D and the remaining 6 genes are plotted in Figures S1 and S2A-B. We also summarized the data in boxplots to visualize the CPM dispersion per gene ( Figure 5E-H and Figure 6C,D).     It has been suggested that a combination of reference genes could provide a more reliable and accurate normalization approach compared to individual reference genes. For generating such a normalization factor, it is important that genes are not co-regulated. In order to spot genes that may be co-regulated, we examined the functions and associated biological processes of the stable genes. As shown in Table 7, the reference gene's functions (at the protein level) are varied including protein folding, glycolysis, signaling cascades, intracellular vesicular trafficking, and splicing. They also participate in several different prominent pathways. Of note, UBE and UBE2D3 both ubiquitylate proteins. Moreover, an analysis of protein interaction (based on experimental evidence from literature) using STRING showed interaction of UBC with UBE2D3 and DAZAP2 and of MORF4L1 with ACTG1 (Figure 7). In addition, RHOA is involved in some biological processes shared with other stable genes i.e., with RAB1A (cell migration and substrate adhesiondependent cell spreading), ACTG1 (response to mechanical stimulus and regulation of focal adhesion assembly), DAZAP (positive regulation of protein serine/threonine kinase activity), and VAPA (positive regulation of I-kappaB kinase/NF-kappaB signaling) ( Table  7).   Member of the chaperonin family. Essential role in folding and assembly of newly imported proteins in the mitochondria.   Table 7. Cont.    Further we analyzed the stability of the candidate reference genes in datasets from samples that did not perform well in mRNA sequencing i.e., urines stored at −20 • C and uEV isolated with NG isolation workflow. We found that all genes were less stable in samples stored at −20 • C and in NG isolates ( Figure S3). Of note, in many NG samples the candidate reference genes were not detected. Despite of this, HSPD1, SRSF3, VAPA, RAB1A, MORF4L1, PGK1, RHOA, UBE2D3, DAZAP2, UBC, ACTG1, showed to be stable in all other diverse experimental conditions and across disease groups. Thus, they may serve as reference genes for uEV mRNA related research.

Discussion
Urinary EV have been regarded as a promising source of biomarkers [5] and this idea is getting support from an increasing number of studies reporting candidate markers for diseases of diverse etiology [8,9,28,162,163]. However, many obstacles prevent replication of biomarker results and, as a consequence, clinical translation. In this study, we approached three of these obstacles: urine storage, uEV isolation and reference genes in kidney disease transcriptomic research.
The first obstacle is the lack of guidelines to handle and store urine. Urine storage temperature (−20 • C vs. −80 • C) has been shown to affect the size and concentration of uEV [164] and recovery of uEV protein markers but the latter could be sorted out by vortexing samples after thawing [165]. In addition, qPCR-based research has been done on uEV RNA by comparing storage temperatures-including −80 • C, 4 • C, room temperature and 37 • C with variable results [166][167][168]. Our group showed that the global uEV miRNA and mRNA profiles were affected when urines were stored at −20 • C vs. −80 • C [14] and we found sets of downregulated and upregulated genes. As particularly the −20 • C downregulated genes were involved e.g., in carbohydrate or lipid metabolism, the result suggested that −20 • C stored samples are less useful for studying kidney diseases. Here, analyzing further the data, we found that a striking 75% of the −20 • C downregulated miRNAs were associated with various kidney diseases (Table 2). Thus, the result reinforces the idea of avoiding urine samples stored at suboptimal temperatures [14], because such samples might not contain putative valuable disease markers anymore.
We also observed that despite the normalized differential expression, miRNAs that were up-regulated in samples stored at −20 • C had still lower raw counts than the same miRNAs in −80 • C stored samples. Thus, the result was the opposite than what the normalized counts showed. TMM normalization is a method based on library size that uses scaling of raw reads to render library sizes comparable which is needed for differential expression analysis [169]. Considering that the library size of the −20 • C samples was smaller (higher number of 0 raw counts and lower expression in general) than that of the −80 • C samples, the upregulation of miRNAs in −20 • C samples may be an artifact of the data analysis. Further, we showed that kidney-RNAs were detected in small quantities after storage at −20 • C (Figure 1). In particular, kidney enriched mRNAs in uEV isolates were highly affected since almost one third (30%) of them were not detected at all in samples stored at −20 • C. Our results agree with and provide further support to a set of urine storage guidelines that has been published recently [17].
The second obstacle is the lack of standardization of uEV isolation methods. Currently, many isolation principles and workflows are available [170] and it is well known that they typically produce different results [13,15,16,168,171]. Obviously, this represents a problem for study comparisons, even if reporting guidelines now help to identify differences, facilitate replication and/or explain the lack of it [12,172]. Prior studies have explored the effect of uEV isolation workflows on uEV RNA sequencing profiles focusing on miRNA sequencing [15,[20][21][22]. We have previously demonstrated that the uEV isolation workflow (UC, HFD and NG) has a surprisingly variable impact on the miRNA and mRNA profiles [13]. Specifically, global miRNA profile analysis suggested that the three workflows were similar overall or-at least-did not differ systematically. This was in contrast to the global mRNA results, where UC and HFD were similar while NG clustered separately [13]. Here, by analyzing the top expressed miRNAs of the kidney, we found that the expression of 18 miRNAs was lower for a set of HFD samples compared to UC and NG samples ( Figure 2). While for 13 miRNAs the differences in the expression levels between UC and HFD were small (3-35%) and could be related to technical bias, for 5 miRNAs differences were in the range of 35-55% and could represent real differences. Of note, miRNA hsa-miR-101-3p (a top kidney expressed miRNA) was significantly downregulated in HFD relative to UC samples [13]. The observation that methods capture slightly different kidney enriched miRNAs could be explained, at least partly, by differences in the uEV populations and/or non-EV components captured by the isolation workflows [13]. On the other hand, analysis of kidney enriched mRNAs was consistent with the global analysis i.e., NG samples could not capture these genes as well as UC and HFD ( Figure 2). Thus, this shows that for a specific research topic like kidney research, it is best to evaluate the differences between methods using specific end-point targets (kidney-RNAs) in addition to a global level analysis.
In addition to urine storage temperature and uEV isolation workflows, many other preanalytical experimental conditions impact the analytical endpoints as well [12]. As experimental set-ups can differ greatly between studies [18], biomarker results cannot be replicated hindering translation of findings to clinic [173]. Considering all the variability, we were positively surprised that our UC workflow replicated some of the previous results from DKD miRNA studies using uEV or urine/urine sediments (see Table 4) i.e., a set of the miRNAs separated experimental groups (healthy controls vs. T1D with macroalbuminuria) (Figure 3). Further, specifically eight miRNAs followed the same regulation direction in both the literature and our dataset. Three of the eight miRNAs had also evidence of DKD-linked dysregulation in kidney or plasma of individuals with DKD and two showed to be dysregulated in kidney cell lines under hyperglycemic conditions in vitro (in addition to evidence in urine/uEV) and all of them related to pathological mechanisms in DKD such as fibrosis and impaired autophagy [174][175][176] (Table 5). However, our dataset had a low number of samples and thus replication of findings in bigger and more varied cohorts is still needed. Interestingly, we found 12 miRNAs in common between the two literature review generated DKD miRNA lists. These miRNAs were associated with pathological pathways involved in DKD such as hypertrophy and fibrosis. These results suggest that the uEV capture a specific subset of DKD-associated miRNA reflecting the differences in the tissue.
The third obstacle jeopardizing the biomarker replication is the lack of normalizers-in the EV transcriptomics field, this means lack of stable reference genes across e.g., many preanalytical workflows and disease conditions. Urinary EV reference genes are a poorly explored topic. While some recommendations exist on how to select reference genes or normalize gene expression data [29,30,177], there are only few studies on this topic in urine. GAPDH, a commonly used reference gene, and UBC were the most stable in EV derived from liver and breast cancer cell lines [33]. In contrast, Singh et al. (2022) tested five common reference genes (including GAPDH) and found that B2M and RPL13 were the most stable in uEV isolated using PEG from patients with renal graft dysfunction. Thus, the stability of GAPDH appears to be dependent on the disease, biofluid and/or EV isolation method. In this study, using datasets available in our original publications [6,9,13,14], we discovered 11 mRNAs (HSPD1,SRSF3, VAPA, RAB1A, MORF4L1, PGK1, RHOA, UBE2D3, DAZAP2, UBC, ACTG1) that were stable across datasets including different pre-analytical conditions, men and women, healthy controls, T1D and T2D patients with different albuminuria status; and prostate cancer patients (Figures 4-6, S1 and S2). However, in poor quality sequencing datasets (urine stored at −20 • C and NG isolation workflow), the candidate genes showed poor stability i.e., high CV ( Figure S3). Of note, our finding regarding UBC stability in uEV is concordant with findings of Gorgi Bahri (2021) in cell culture media derived EV. Further, even though GAPDH was not one of the most stable mRNAs, it was less variable than UPK1A (a highly variable mRNA selected to compare our candidate reference mRNAs).
One of the reasons that prevents the study of uEV reference genes is the lack of data. Many studies have focused on miRNA/small RNA sequencing, but only few on RNA sequencing. Moreover, the few studies with a good amount of uEV samples from patients [59,178] do not have the associated raw sequencing data and/or raw sequencing counts freely available (to date). Local regulations could hinder the publication of sequencing data but raw count data describing all the pre-processing and alignment procedures is also helpful for the research community. Such practicalities should be considered for the informed consent and ethical permissions. Given more available datasets in future, the stability of our 11 candidate mRNAs could be further tested and a combination of selected genes used as reference genes e.g., by calculating the geometrical mean [179]. As it is recommended that reference genes should belong to different biological pathways and that expression is regulated independently for each reference gene, caution should be taken if using UBC and UBED3 and/or DAZAP2 or MORF4L1 and ACTG1 together since an analysis using STRING showed evidence of experimentally validated interactions in the literature (Figure 7). In addition, UBC and UBE2D3 are co-expressed (https://string-db.org/, accessed on 28 April 2023) and form a protein complex [180]. Moreover, RHOA shared biological processes with RAB1A, ACTG1, DAZAP and VAPA (Table 7). It is good to keep in mind that the uEV reference mRNA candidates could be contributing to biological processes associated with kidney diseases e.g., RAB1A contributes to autophagy. Nevertheless, their stability in our datasets which included isolates from healthy and type 1 diabetic individuals with and without DKD, and diverse preanalytical setups (roughly 200 isolates) motivates further experimental validations.
We acknowledge that a full understanding of the effect of all pre-analytical choices and pathophysiological conditions for transcriptomic applications calls for big testing resources. Ideally, cross-laboratory testing should be performed, and laboratories could implement reference materials, a gold standard isolation protocol, and housekeeping normalizers. Our results here help towards this goal by providing new insights for the three key obstacles hindering uEV biomarker validation. For the first two, urine storage and uEV isolation, we found that it is important to study the raw counts in addition to the normalized counts and kidney-RNAs in addition to the global transcriptome-they offer different although complementary results. For the third, the reference genes, we provide 11 mRNAs that could be tested for qPCR normalization in the context of DKD and prostate cancer. Finally, despite the known and hereby addressed variability between uEV studies, we successfully replicated many previously found urine/uEV/urinary pellet miRNAs associated with DKD in our UC DKD dataset. We regard this as an encouraging result for the reproducibility of uEV biomarker research.
Supplementary Materials: The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/genes14071415/s1, Figure S1: The mRNA sequencing read counts of six of the eleven candidate reference genes across samples in pre-analytical and DKD uEV datasets from men. Figure S2: The mRNA sequencing read counts of six of the eleven candidate reference genes across samples in uEV datasets from DKD study of women and from prostate cancer patients. Figure S3: The mRNA sequencing read counts of candidate reference genes in storage temperature and NG datasets. Table S1: Dysregulated miRNAs in samples stored at −20 • C, raw and normalized counts. Table S2: Kidney-RNAs raw and normalized counts for storage temperature dataset. Table S3: Kidney-RNAs raw and normalized counts for Isolation workflows. Funding: This project has received funding from the Innovative Medicines Initiative 2 Joint Undertaking under grant agreement No 115974. The JU receives support from the European Union's Horizon 2020 research and innovation programme and EFPIA and JDRF. Any dissemination of results reflects only the author's view; the JU is not responsible for any use that may be made of the information it contains.Open access funding provided by University of Helsinki.

Institutional Review Board Statement:
The original studies, where the datasets were generated, were conducted in accordance with the Declaration of Helsinki, and approved by the pertinent Institutional Review Board as reported in [6,9,13,14].
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study as reported in [6,9,13,14].
Data Availability Statement: Datasets used in this publication are available in the original publications mentioned in Table 1.