Deciphering the Non-Coding RNA Landscape of Pediatric Acute Myeloid Leukemia

Simple Summary Survival rates for children with acute myeloid leukemia have significantly improved in recent decades. Still, 20–30% will succumb to therapy-related toxicity and relapse. Research has shown that leukemic stem cells (LSCs) in particular are responsible for relapse. Therefore, the characterization of the LSC is needed to improve the current treatment options for and survival of these pediatric patients. Previously, research has focused mainly on the protein-coding part of the human genome, but recently the focus has also shifted to non-coding genes, including long non-coding RNA (lncRNA) and microRNA (miRNA). The aim of this study was to investigate the expression of lncRNAs and miRNAs in pediatric AML subpopulations (LSCs and leukemic blasts) and their normal counterparts (hematopoietic stem cells and control myeloblasts). We provide here a unique set of non-coding RNAs with a potential role in the pathogenesis of pediatric AML, paving the way for further translational research studies. Abstract Pediatric acute myeloid leukemia (pedAML) is a heterogeneous blood cancer that affects children. Although survival rates have significantly improved over the past few decades, 20–30% of children will succumb due to treatment-related toxicity or relapse. The molecular characterization of the leukemic stem cell, shown to be responsible for relapse, is needed to improve treatment options and survival. Recently, it has become clear that non-coding RNAs, including long non-coding RNAs (lncRNAs) and microRNAs (miRNAs), play a role in the development of human diseases, including pediatric cancer. Nevertheless, non-coding RNA expression data in pedAML are scarce. Here, we explored lncRNA (n = 30,168) and miRNA (n = 627) expression in pedAML subpopulations (leukemic stem cells (LSCs) and leukemic blasts (L-blasts)) and their normal counterparts (hematopoietic stem cells and control myeloblasts). The potential regulatory activity of differentially expressed lncRNAs in LSCs (unique or shared with the L-blast comparison) on miRNAs was assessed. Moreover, pre-ranked gene set enrichment analyses of (anti-) correlated protein-coding genes were performed to predict the functional relevance of the differentially upregulated lncRNAs in LSCs (unique or shared with the L-blast comparison). In conclusion, this study provides a catalog of non-coding RNAs with a potential role in the pathogenesis of pedAML, paving the way for further translational research studies.


Introduction
Pediatric acute myeloid leukemia (pedAML) is a rare hematological disease that accounts for 20% of all pediatric leukemias [1]. With current chemotherapeutic regimens, the five-year survival rate has reached a plateau of 70% [2,3]. Unfortunately, 30% to 40% of the good responders will experience relapse, often correlated with specific genetic subgroups [2]. The high relapse rate is thought to be caused by the persistence of leukemic stem cells (LSCs). Therefore, therapies that not only target the rapidly dividing leukemic blasts (L-blasts) but also the LSCs are needed to improve the outcomes.
Efforts to characterize alterations in the transcriptional program of the disease have led to the discovery of novel AML-specific targets, albeit mostly focused on the protein-coding part of the transcriptome [4][5][6][7]. The identification of AML-specific therapeutic targets is challenging as co-expression in the hematopoietic stem cell (HSC) is often observed, leading to severe cytopenia when targeted [8]. As AML is a biologically complex disease, dictated in part by mutational, clonal, and epigenetic heterogeneity, the knowledge on its pathogenesis is still evolving [9]. Recently, an LSC-specific signature of 111 lncRNAs characterizing cytogenetic normal adult AML was uncovered, revealing potentially interesting novel targets for therapy [10]. The role and function of non-coding transcripts are being increasingly explored, further expanding our knowledge on the pathogenesis of pedAML and providing therapeutic and diagnostic opportunities. Long non-coding RNAs (lncRNAs), non-coding transcripts of >200 nucleotides, constitute a heterogeneous group of non-coding RNAs acting as key regulators of gene transcription, protein translation, and epigenetic regulation [11]. Interestingly, compared to messenger RNAs, the expression of lncRNAs seems to be much more tissue specific [12,13], making them highly interesting therapeutic targets. Research has shown that lncRNAs can harbor binding sites for miRNAs-non-coding transcripts of approximately 18-25 nucleotides that negatively regulate gene expression post-transcriptionally-acting as endogenous competitors [14][15][16][17]. Subsequently, lncRNAs can sequester or sponge miRNAs, resulting in their dysregulated activity on physiologically relevant target genes. The process of sponging has been widely studied in diverse cancer entities [18], including AML.
Here, we characterized the lncRNA expression in the LSCs and L-blasts in pedAML and identified key differentially expressed (DE) lncRNAs compared to their respective healthy counterparts, taking into account a threshold for their expression in the HSCs. To unravel potential endogenous competitors, miRNA-lncRNA networks of anti-correlated miRNAs to the identified DE-lncRNAs were examined. Furthermore, a gene set enrichment analysis (GSEA) was performed to uncover associated pathways. Altogether, our results highlight the role of lncRNAs in the pathogenesis of pedAML and contribute to deciphering the biological complexity of the disease. To this end, we identified key non-coding RNAs that need further evaluation regarding their roles as novel biomarkers for risk stratification, follow-up, and targeted therapy.

Patients and Controls
Bone marrow (BM) and/or peripheral blood (PB) from four pedAML patients with diverse molecular characteristics was selected based on cell availability (>50 × 10 6 after routine workup) and CD34+ positivity (≥1%) ( Table 1). As a control, cord blood (CB = 3) obtained after full-time delivery was used. All subjects gave their informed consent for inclusion before participation. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Ethics Committee of the University Hospital of Ghent (EC2015-1443 and EC2019-0294).

Cell Sorting
Mononuclear cells were isolated by Ficoll density gradient (Axis-Shield Diagnostics Ltd., Dundee, Scotland), complemented by CD34 isolation if the CD34 expression was <50% (CD34 MicroBead Kit, Milteny Biotec B.V., Leiden, The Netherlands). Cell sorting was performed to isolate CD34+/CD38− and CD34+/CD38+ cells from patients and controls, defined as LSCs and HSCs, and L-blasts and control blasts (C-blasts), respectively. Sorting was performed according to the method of Depreter and colleagues [19], and a detailed description can be found in the Supplemental Material and Methods. Sorted cells were collected in RPMI supplemented with 50% FCS, and a post-sort purity of >90% was reached. Antibodies used for sorting are shown in Table S1. Sorted cells were spun down (10 min, 3000 rpm, 4 • C) and resuspended in 700 µL of TRIzol for RNA extraction.

RNA Isolation
RNA was extracted using the miRNeasy Mini or Micro Kit (QIAGEN Benelux B.V. -Belgium, Antwerp, Belgium) in combination with on-column DNase I digestion (RNAse-Free DNase set, Qiagen) according to the manufacturer's instructions. RNA concentrations were measured by Nanodrop (ThermoFisher Scientific BVBA, Merelbeke, Belgium) or Qubit RNA HS Assay (Invitrogen, Merelbeke, Belgium).

Micro-Array Analysis
Three LSC and four L-blast fractions, next to two HSCs and three C-blast controls, were profiled on a custom-designed Agilent 8 × 60 K humane gene expression micro-array platform by Biogazelle. This micro-array contained probes for all human protein-coding genes (n = 27,071) and lncRNA probes (n = 30,168) based on the LNCipedia database for lncRNAs [20]. Micro-array profiling was performed according to Depreter et al. [19], and details can be found in Supplementary Methods. Differentially expressed (DE) lncRNAs were identified based on |log2FC| > 2 and adjusted p-values (adj. p) ≤ 0.05 using the limma package (R Bioconductor). However, if a differentially expressed gene was represented by multiple probes with an equal transcript, the one with the highest fold change was selected for further downstream analyses. Raw data can be found under GSE 128,103 (released on 9 March 2022).

Small RNA-Sequencing Analysis
Small RNA expression profiling was performed using Biogazelle on the same sample cohort. Libraries for RNA sequencing were prepared using Biogazelle and a TruSeq small RNA library kit (Illumina) according to the manufacturer's instructions. The resulting small RNA libraries were concentrated via ethanol precipitation and quantified using the Qubit 2.0 Fluorometer before sequencing with a read length of 75 bp on a NextSeq 500 sequencer (Illumina). Mapped reads were subsequently annotated to mature miRNAs using genome annotation data from Ensemble release 84, UCSC, and miRbase v21. Prior to normalization, data were filtered using a cut-off of 4 reads. miRNA expression data were normalized based on the total read count per sample. After normalization, a pseudo-count of 1 to each read miRNA count was added before log transformation. DE-miRNAs were examined based on |log2FC| > 2 and adj. p ≤ 0.05 using the EdgeR and limma packages (R Bioconductor). Raw data can be found under GSE 196,886 (released on 16 August 2022).

miRNA-lncRNA Network Constuction
Pearson correlations between miRNAs and LSC DE-lncRNAs as miRNAs and shared DE-lncRNAs were computed to construct a miRNA-lncRNA network, using the base R cor() function. The threshold for anti-correlated miRNAs was set at a Pearson correlation coefficient of −0.7.

Gene Set Enrichment Analysis
A Pearson correlation matrix was computed between the unique upregulated LSC DE-lncRNAs, as well as the shared DE-lncRNAs, and a set of protein-coding genes (n = 27,071) using the base R cor() function. For every (anti-)correlated (|ρ| > 0.7) protein-coding gene, the mean of the absolute expression values in the four fractions (LSC, L-blast, HSC, and C-blast) was calculated. A pre-ranked gene set enrichment analysis (GSEA) was performed using GSEA 4.1.0 (by Broad Institute) for the respective (anti-)correlated protein-coding genes of each selected DE-lncRNA to explore pathways of involvement using the hallmark gene set (n = 50) (www.gsea-msigdb.org, accessed on 22 December 2021) The HGNC ID nomenclature was used as a chip platform for the GSEA. The false discovery rate (FDR) was set at <10%. The genes TAF9 and HNRNPA2B1 were removed from the enrichment analysis, as the different probes were both correlated and anti-correlated to the respective DE-lncRNA, resulting in conflicting results.

Identification of Differentially Expressed lncRNAs in pedAML Subpopulations
To identify lncRNAs expressed in different pedAML cellular fractions, we performed lncRNA profiling using RNA isolated from CD34+/CD38+ (n = 4, L-blast) and CD34+/ CD38− (n = 3, LSC) sorted cell populations from four pedAML patients. In addition, sorted CD34+/CD38+ (n = 3, C-blast) and CD34+/CD38− (n = 2, HSC) from CB were profiled to examine lncRNA expression in their normal counterparts. Principal component analysis showed that the different subpopulations cluster correctly and are divergent from the healthy CB controls ( Figure 1  the hallmark gene set (n = 50) (www.gsea-msigdb.org, accessed on 22 December 2021) The HGNC ID nomenclature was used as a chip platform for the GSEA. The false discovery rate (FDR) was set at <10%. The genes TAF9 and HNRNPA2B1 were removed from the enrichment analysis, as the different probes were both correlated and anti-correlated to the respective DE-lncRNA, resulting in conflicting results.

Identification of Differentially Expressed lncRNAs in pedAML Subpopulations
To identify lncRNAs expressed in different pedAML cellular fractions, we performed lncRNA profiling using RNA isolated from CD34+/CD38+ (n = 4, L-blast) and CD34+/CD38 (n = 3, LSC) sorted cell populations from four pedAML patients. In addition, sorted CD34+/CD38+ (n = 3, C-blast) and CD34+/CD38 (n = 2, HSC) from CB were profiled to examine lncRNA expression in their normal counterparts. Principal component analysis showed that the different subpopulations cluster correctly and are divergent from the healthy CB controls ( Figure 1  To further delineate potential interesting therapeutic targets, a threshold for the absolute expression in the HSC fraction was set to expression values less than seven, corresponding to the black corner expression plus one and ensuring low or no expression in the HSC fraction. This resulted in nine upregulated DE-lncRNAs, of which three were unique to the LSC-HSC differential expression analysis ( Figure 2, Table 2). Although the majority of the nine upregulated DE-lncRNAs could not be examined in the LncScape database (www.lncscape.de, accessed on 24 March 2022), probably as a consequence of the rapidly evolving knowledge on and discovery of novel lncRNAs, we could observe low HSC expression for lnc-GSG1-1, in agreement with our results. In the L-blast fraction, 37 upregulated DE-lncRNAs remained, of which 31 were assigned to the L-blast versus C-blast comparison ( Figure 2). To further delineate potential interesting therapeutic targets, a threshold for the absolute expression in the HSC fraction was set to expression values less than seven, corresponding to the black corner expression plus one and ensuring low or no expression in the HSC fraction. This resulted in nine upregulated DE-lncRNAs, of which three were unique to the LSC-HSC differential expression analysis ( Figure 2, Table 2). Although the majority of the nine upregulated DE-lncRNAs could not be examined in the LncScape database (www.lncscape.de, accessed on 24 March 2022), probably as a consequence of the rapidly evolving knowledge on and discovery of novel lncRNAs, we could observe low HSC expression for lnc-GSG1-1, in agreement with our results. In the L-blast fraction, 37 upregulated DE-lncRNAs remained, of which 31 were assigned to the L-blast versus C-blast comparison (Figure 2).

Identification of Potential Functional Networks
In order to assess the sponging activity of lncRNAs in pedAML, we performed a small RNA sequencing on the samples used for the lncRNA expression profiling. A total of 627 miRNAs were examined for differential expression analysis, and 22 were found to be significantly differentially expressed (|logFC| > 2 and Adj. p value ≤ 0.05) in the LSCs compared to the HSCs, with 4 upregulated and 18 downregulated (Figure 4a, Supplementary Table S4). Comparison of L-blasts to C-blasts revealed 60 significantly differentially expressed miRNAs, with 25 upregulated and 26 downregulated (Figure 4b, Supplementary Table S5).

Identification of Potential Functional Relevance of Unique LSC DE-lncRNAs and Shared DE-lncRNAs
To predict the functional relevance of the DE-lncRNAs, a pre-ranked GSEA was performed on a gene set of correlated protein-coding genes with a Pearson correlation coefficient of >0.7. The focus was narrowed to the unique DE-lncRNAs in the LSCs and the shared DE-lncRNAs. Involvement in various cancer pathways was examined using the hallmark gene set (MSigDB) to identify potential functional relevance in cancer development.

Discussion
Although the survival of pedAML has steadily increased during the past decade, relapses occur in almost 40% of patients and remain an important cause of death [2]. The further refinement of risk-stratification strategies and the identification of novel therapeutics targeting the LSC fraction, thought to be responsible for relapse, are warranted. The recently increased interest in studying the non-coding part of the human genome has shown the involvement of non-coding actors, including lncRNAs and miRNAs, in the development of solid and hematopoietic cancers, revealing potential cancer-associated targets. In this study, we identified a novel set of DE-lncRNAs and their associated transcriptional network in LSC and L-blast subpopulations of pediatric AML patients.
Although they have not yet been previously described, three lncRNAs (lnc-CHST2-2, lnc-EPS15L1-3, and lnc-KLHL25-4) were found to be uniquely upregulated in the LSC fraction of pedAML patients. Taken together with their low expression in HSCs, these three lncRNAs might be excellent targets for further therapeutic developments using, e.g., antisense oligonucleotides, as previously demonstrated in other cancer types [21,22]. Similarly, two lncRNAs (LINC01794 and lnc-TBX20-3) are uniquely downregulated in the LSCs. These two should be further explored as potential prognostic biomarkers, and uncovering their functional network may lead to new therapeutic opportunities.
Further corroborating our results, lnc-RNFT2-1 and lnc-CLVS1-1, identified as lncRNAs uniquely differentially expressed in the L-blast fraction, have previously been shown to be differentially expressed in CD34+ hematopoietic stem and progenitor cells of patients with MDS compared to healthy donors through micro-array profiling by Liu and colleagues [23]. As MDS patients bear a high risk of developing AML and patients with MDS-AML often have a worse prognosis [24], it will be interesting to further elaborate on the functional roles of lnc-RNFT2-1 and lnc-CLVS1-1 in AML and MDS.
Several lncRNAs (lnc-GSG1-1, lnc-RGMA-1, lnc-KMT2E-1, LINC01220, LINC00649, and lnc-LYST-4) were upregulated in LSCs as well as in L-blasts in comparison to their normal counterparts. Interestingly, the knockdown of LINC01220 in endometrial carcinoma has been shown to inhibit proliferation and induce apoptosis, suggesting its therapeutic potential [25]. In contrast to our findings, LINC00649 expression was recently shown to be aberrantly low in AML patients [26], although in many other cancer types, high expression of LINC00649 has been shown to be crucial in driving carcinogenesis [27][28][29]. Functional work will need to further establish the role of LINC00649 specifically in the context of pedAML.
We aimed to predict the functional relevance of the identified DE-lncRNA through a pre-ranked GSEA of (anti-)correlated (|ρ| > 0.7) protein-coding genes. Interestingly, CD69, known to be responsible for the self-renewal capacity of LSCs, was enriched in the hallmark pathways TNFA signaling via NFKB, apoptosis, inflammatory response, and interferon-gamma response for lnc-EPS15L1-3 and lnc-KMT2E-1 as well as all shared DE-lncRNAs. However, exact knowledge of the function of CD69 in AML is lacking [45]. Furthermore, all identified upregulated DE-lncRNAs (shared and unique) were correlated with PPP1R15A, which was enriched in the TNFA signaling via NFKB, MTORC1 signaling, hypoxia, p53, and TGF beta signaling hallmark pathways. Although it is not linked to hematological malignancies, PPP1R15A is part of a hypoxia-related prognostic signature for breast cancer [46]. Moreover, SLC2A3 was also correlated with all DE-lncRNAs (unique and shared) and was previously shown to be upregulated in the L-blasts in pedAML [19]. IER3 and NAMPT, which correlated with all but two (lnc-CHST2-2 and lnc-LYST-4) of the upregulated DE-lncRNAs, both have a therapeutic link with AML. On the one hand, IER3 is normally downregulated in AML, and vorinostat increases IER3 expression to favor cell cycle arrest, differentiation, and apoptosis of malignant cells [47]. On the other hand, NAMPT inhibitors can selectively induce the apoptosis of AML stem cells by disrupting lipid homeostasis [48].
As the aim of this study was to find new therapeutic targets by identifying and exploring functional networks in which lncRNAs play a pivotal role, the focus was narrowed to uniquely upregulated LSC DE-lncRNAs and shared upregulated DE-lncRNAs. However, downregulated lncRNAs also have an important role and may emerge as new prognostic markers predicting the outcome of pedAML patients. Although the exploration of downregulated targets is beyond the scope of our study, our data set also provides leads for further research on downregulated lncRNAs.
This study has some limitations, including its design and the limited number of participants. Consequently, the provided data should be interpreted with caution within the framework of generalizability. In addition, we acknowledge that extensive in vitro and in vivo validation of several of the identified targets is needed to fully uncover the potential of the presented data. Nevertheless, here we provide a repository of potential targets that allow us to fulfill the imperative need for alternative therapeutic strategies in pedAML.

Conclusions
In conclusion, this study provides a unique set of LSC-and L-blast-specific lncRNAs in pedAML and their interactions with miRNAs, ultimately impacting gene expression. The generated data provide a rich source of promising targets, most of which have not been previously described in pedAML, and networks that could serve as anchor points for further functional experimental research.