1. Introduction
Lung cancer ranks first among all cancer types in incidence and mortality, accounting for 18.4% of cancer-related deaths worldwide [
1]. Despite the continued development of novel treatment methods, including targeted and immuno-oncology therapies, which have significantly improved the survival rate of lung cancer patients, the 5-year survival rate of these patients remains <20% [
2]. Therefore, identification and validation of prognostic markers useful for screening patients most likely to respond to a given therapy are urgently needed.
Prognostic markers include gene mutations, single-nucleotide polymorphisms of genes or regulatory elements, as well as levels of proteins, mRNAs, and noncoding RNAs. In particular, advances in global transcriptome analysis have promoted attempts to exploit RNA-expression levels as prognostic markers. For example, the analysis of reverse transcription-polymerase chain reaction (RT-PCR) data obtained from 147 patients with non-small cell lung cancer (NSCLC), the most common type of lung cancer, identified a six-gene signature (
STX1A,
HIF1A,
CCT3,
HLA-DPB1,
MAFK, and
RNF5) as a prognostic marker of poor patient outcomes [
3]. In another study, microarray data from formalin-fixed paraffin-embedded (FFPE) samples from 55 NSCLC patients revealed a 59-gene prognostic signature [
4]. Microarray profiling of microRNAs (miRNAs) in 104 lung adenocarcinoma (LUAD; a major subtype of NSCLC) patient samples revealed that high
hsa-mir-155 and low
hsa-let-7a-2 expression correlated with poor patient survival [
5]. These attempts were groundbreaking but not particularly successful, possibly due to their small sample sizes, inconsistent platforms, inappropriate feature-processing steps, or lack of reliably robust methods for effectively analyzing high-dimensional data [
6].
miRNAs, as the most extensively studied noncoding RNAs, are small single-stranded RNAs (19–25 nucleotides in length) and endogenous suppressors of target genes [
7,
8]. Their sequences are complementary to the 3′-untranslated regions (3′-UTRs) of target mRNAs and bind to these regions through Watson–Crick base pairing. Perfect matched binding of miRNAs to 3′-UTRs leads to mRNA degradation, whereas imperfect matched binding leads to translational repression. By suppressing target gene expression or protein translation, miRNAs regulate diverse physiological and pathological conditions, including cancer [
8]. miRNAs can either promote or repress cancer development and progression according to their target genes. Moreover, numerous miRNAs can act as oncogenes by negatively regulating tumor suppressors. For example, miR-21 expression is upregulated in colon cancer and promotes cell growth and invasion by repressing the tumor suppressor
PTEN [
9]. miR-183 is overexpressed in colon cancer and represses
EGR1, which encodes a transcription factor that acts as a tumor suppressor, to promote tumor cell migration [
10]. Conversely, tumor-suppressor miRNAs can inhibit tumorigenesis, epithelial-to-mesenchymal transition (EMT), and metastasis by suppressing oncogenes. In lung cancer, let-7 controls cellular proliferation by negatively regulating the
KRAS oncogene [
11]. Additionally, miR-200 family members suppress EMT, migration, invasion, and metastasis of lung cancer cells by directly repressing
ZEB1, a gene encoding an EMT-inducing transcription factor [
12]. These represent examples of attempts to use miRNAs as biomarkers for cancer detection, diagnosis, prognosis, and drug efficacy [
13].
We recently developed a novel prognosis-associated feature-selection framework called Cascaded Wx (CWx), an artificial neural network-based algorithm that ranks features (genes) according to cancer patient survival by training neural networks with high- and low-risk cohorts in a cascading fashion [
6]. We used CWx to analyze information for LUAD patients (
n = 507) among transcriptome data from The Cancer Genome Atlas (TCGA; 20,501 genes) and demonstrated the superiority of CWx to other models for identifying prognosis-related genes. In the present study, we applied the CWx platform to analyze LUAD TCGA miRNA-expression data to identify miRNA features associated with LUAD patient survival. Combined with NanoString miRNA assays in FFPE lung tumor samples, we validated the efficacy of several miRNAs selected by CWx for use as prognostic markers to predict survival in LUAD patients.
3. Discussion
Identification and validation of clinically applicable prognostic markers that accurately predict patient survival or drug response are crucial to achieving better treatment outcomes and improved survival rates in lung cancer. For this purpose, numerous studies have been conducted over the course of decades. Recently, one study demonstrated both circulating tumor cells exhibiting an active EMT status (vimentin+ and EGFR+) and tumoral expression of
AXL mRNA as prognostic factors for OS and RFS of patients with early stage resectable NSCLC [
25]. Using a different approach, we applied a neural network-based CWx framework to extract miRNA markers most highly associated with LUAD patient survival, profiled their expression levels in LUAD patient samples using NanoString technology, validated their effects on patient survival, and elucidated their functions in lung cancer cells. The results identified miR-374a and miR-374b, both EMT-related miRNAs, as potential prognostic markers associated with poor survival in LUAD patients.
Machine learning algorithms are useful for analyzing large volumes of data, such as genetic information produced by next-generation sequencing (NGS) technologies. Support vector machines [
26], decision trees [
27], and random forest [
28] algorithms have been frequently adopted to extract prognostic features from high-throughput NGS profiling data [
29,
30,
31,
32]. Recently, we proposed that the CWx framework demonstrated enhanced feature-selection efficiency and increased accuracy in prognostic predictability as compared with previous algorithms [
6]. Previous studies report that miRNA signatures show predictive, diagnostic, and prognostic value and are capable of enhancing the efficacy and feasibility of low-dose computed tomography screening in lung cancer patients [
33,
34]. These experimental results suggest that miRNAs can be used as effective lung cancer biomarkers. Therefore, we expanded the previously developed CWx framework for the analysis of miRNA profiling data and successfully extracted miRNA features associated with LUAD patient survival. Moreover, the top miRNAs identified by CWx were validated experimentally and clinically to demonstrate their prognostic potential.
miR-374 family members (miR-374a, -374b, and -374c) play indispensable regulatory roles in diverse physiological and pathological processes, including cancer development and metastasis [
35]. In triple-negative breast cancer, miR-374a is upregulated relative to levels in non-tumor tissues and promotes cell proliferation, migration, and tumor progression in vivo by targeting arrestin-β1 (
ARRB1) [
36]. Additionally, miR-374a activates Wnt/β-catenin signaling by directly targeting
WIF1,
PTEN, and
WNT5A, thereby promoting breast cancer metastasis [
18]. In hepatocellular carcinoma cells, miR-374a promotes cell growth by targeting MIG-6 (
ERRFI1), a negative regulator of EGFR signaling [
23]. Moreover, miR-374b promotes cellular proliferation and inhibits apoptosis in gastrointestinal stromal tumors by targeting
PTEN and activating PI3K/AKT signaling [
37].
By contrast, miR-374a and miR-374b reportedly suppress the progression of some cancers. miR-374b suppresses the migration and invasion of bladder cancer cells by targeting
ZEB2, an EMT-inducing transcription factor [
22], and suppresses cell proliferation, migration, and EMT in ovarian cancer by targeting
FOXP1 [
23]. Even in LUAD cells, miR-374a suppresses cell proliferation and invasion by targeting TGF-α (
TGFA) [
38], and in early stage NSCLC, high expression of miR-374a is associated with improved survival rates [
14]. These conflicting roles of miR-374a and miR-374b were clarified in a study performed by Zhao et al. [
39] in NSCLC cells, identifying dual stage-specific roles of miR-374a: suppression of cell growth, migration, invasion, and metastasis by targeting cyclin D1 (
CCND1) in early-stage NSCLC while also targeting
PTEN in advanced-stage NSCLC. Therefore, high miR-374a expression in early stage NSCLC is associated with improved patient survival rates, but in advanced NSCLC, it is associated with shorter survival, which is similar to the findings of the present study.
The CWx platform has predicted multiple other candidate miRNAs beyond miR-374a and miR-374b as associated with LUAD patient survival. Of these, let-7f is a member of the let-7 family, which includes well-known tumor-suppressor miRNAs that target oncogenes, such as
MYC,
RAS, and
CCND1 [
40]. miR-101 inhibits lung cancer proliferation and metastasis by targeting
ZEB1 [
41], and miR-200c is an EMT-inhibitory miRNA that also targets
ZEB1 [
12]. Additionally, miR-21 is an oncogenic miRNA targeting
PTEN and frequently upregulated in solid tumors [
42]. These findings suggest that miRNA features derived through the CWx platform are related to cancer development and metastasis. Further studies are needed to validate the pathophysiological functions of these miRNA features in LUAD.
In summary, we conducted an integrated study that included machine learning, clinical sample profiling, and cellular experiments to predict and validate prognostic miRNA markers associated with LUAD patient survival. The results identified miR-374a and miR-374b as promoting cancer cell invasion through their elevated expression in LUAD patients and association with poor prognosis. We anticipate that the proposed CWx miRNAs will be useful as LUAD-specific prognostic markers following further experimental and clinical evaluation.
4. Materials and Methods
4.1. Data Acquisition
miRNA-expression data from 192 LUAD patients were obtained from TCGA via the firebrowse website (
http://firebrowse.org/). These data contained normalized expression (reads per million) levels of 1,046 known miRNAs extracted from LUAD tissues. Of these miRNAs, 237 exhibiting no expression (or no changes between samples) were discarded. Therefore, expression values from a total of 809 miRNAs were used to extract core miRNAs associated with the prognosis of LUAD patients using the CWx algorithm.
4.2. CWx Analysis
The CWx algorithm was used to identify prognosis-associated miRNAs in 192 TCGA LUAD patients. First, patients were divided into high- (
n = 98) and low-risk (
n = 94) groups depending on whether they had survived for 3 years (
Figure 1A). The number of miRNAs (features) was also reduced by ~50% at this step, after which the same process was conducted with different survival cut-offs (
Figure 1A). Finally, 197 miRNAs were ranked according to CWx scores (the higher the CWx score, the more relevant to the prognosis).
4.3. Cell Culture
Cell lines 393P, 344SQ (murine lung cancer), A549, and H1792 (human lung cancer) were cultured in RPMI 1640 (Welgene, Gyeongsan, Korea) with 10% fetal bovine serum (FBS; HyClone, Logan, UT, USA) at 37 °C in the presence of 5% CO
2. Murine lung cancer cells were established and transfected with
ZEB1 and miR-200 as described in our previous studies [
12,
43]. A549 cells were transduced with the pLMP-mCherry retroviral vector (a gift from Ken Scott, Baylor College of Medicine, Houston, TX, USA) to allow visualization via a red fluorescence signal.
hsa-miR-374a mimic,
hsa-miR-374b mimic, and negative controls were obtained from BIONEER (Daejeon, Korea) and transiently transfected into A549 or H1792 cells using TransIT-X2 transfection reagent (Mirus Bio, Madison, WI, USA) according to manufacturer instructions. For wound-healing assays, scratches were made with a 1000-µL pipette tip when cells became confluent in 6-well plates, and cells were cultured in complete media with 10% FBS in the presence of mitomycin C (1 µg/mL; Sigma-Aldrich, St. Louis, MO, USA) to block proliferation-related effects. After 24 h, the wound area was measured using Image J software (National Institutes of Health, Bethesda, MD, USA).
4.4. Quantitative RT-PCR (qRT-PCR)
We used WelPrep total RNA isolation reagent (Welgene) to isolate total RNA from cultured cells. To quantitate mRNA-expression levels, cDNA was first synthesized from total RNA by reverse transcription using the ELPIS RT Prime kit (Elpis-Biotech, Daejeon, Korea), and quantitative PCR assays was performed using a BioFACT A-Star real-time PCR kit including SFCgreen I (BioFACT, Daejeon, Korea) with the AriaMx real-time PCR system (Agilent Technologies, Santa Clara, CA, USA). mRNA levels were normalized to that of a housekeeping gene [ribosomal protein L32 (
RPL32)]. qRT-PCR primers used in this study are listed in
Table S4. To quantitate cellular miRNA levels, we used an HB miR Multi Assay kit (Heimbiotek, Seongnam, Korea) and normalized miRNA levels to that of
RNU6B snoRNA.
4.5. RNA Extraction from FFPE Tumors
LUAD patients (n = 180) who underwent surgical resection with a curative aim were retrospectively selected at Seoul St. Mary’s Hospital, Yeouido St. Mary’s Hospital, Bucheon St. Mary’s Hospital, or Uijeongbu St. Mary’s Hospital of Catholic Medical Center (Seoul, Korea). This study was approved by the institutional review board of Catholic Medical Center (No. UC17SESI0073). Total RNA was extracted from FFPE tumors from these patients using a miRNeasy FFPE kit (QIAGEN, Hilden, Germany) according to manufacturer instructions. RNA quantity and quality were assessed using a DS11 spectrophotometer (Denovix, Wilmington, DE, USA) and a fragment analyzer (Advanced Analytical Technologies, Ankeny, IA, USA).
4.6. MiRNA-Expression Profiling by NanoString
To measure miR-374a and miR-374b levels in FFPE samples, we performed an nCounter microRNA expression assay (NanoString Technologies, Seattle, WA, USA) using the human miRNA v3 assay kit by Philekorea (Seoul, Korea). Oligonucleotide-tagged miRNAs were hybridized with the human miRNA code set for 18 h at 65 °C, and the individual fluorescence intensity of target miRNAs was quantified using the nCounter digital analyzer, which was also used to obtain images of fluorescent reporters. miRNA data were normalized and analyzed using nSolver software (NanoString Technologies).
4.7. Western Blot
To isolate cellular proteins, cells were incubated with lysis buffer [50 mM Tris-HCl (pH 7.4), 150 mM EDTA, and 1% Triton X-100] containing protease inhibitors (Sigma-Aldrich). After electrophoresis (SDS-PAGE), proteins were transferred onto polyvinylidene difluoride (PVDF) membranes, and protein blots were incubated with primary antibodies and horseradish peroxidase-conjugated secondary antibodies (Cell Signaling Technology, Danvers, MA, USA). Protein bands were visualized with a PicoEPD (Enhanced Peroxidase Detection) Western reagent kit (Elpis-Biotech). We used antibodies against ZEB1 (#sc-25388; Santa Cruz Biotechnology, Dallas, TX, USA), vimentin (#sc-5565; Santa Cruz Biotechnology), and actin (#BS6007M; Bioworld Technology, St. Louis Park, MN, USA).
4.8. Spheroid Invasion Assay
Spheroid invasion assays were performed as described previously [
44]. Briefly, to create spheroids, lung cancer cells (1 × 10
5 cells/5 mL) in 20% METHOCEL (Sigma-Aldrich) and 1% Matrigel (BD Biosciences, Franklin Lakes, NJ, USA) were hung on the lid of 15 cm dishes (50 μL/drop) and incubated at 37 °C for 2 days. Spheroids were then harvested in a 15 mL tube, which was placed in the incubator for 30 min to allow the spheroids to settle. Spheroids were mixed gently with collagen solution (3 mg/mL collagen in 0.5× phosphate-buffered saline and 0.01 N NaOH) and then implanted in the center of each well of a 12-well plate. After the collagen gels polymerized, the wells were filled with cell culture media. After 24 to 48 h, invading cells were observed under a Leica DMi8 inverted microscope (Leica Microsystems, Wetzlar, Germany), and the invasion ratio was calculated by dividing the total invasion area by the central spheroid area measured using Image J software (National Institutes of Health).
4.9. RNA Sequencing
Total RNA was isolated from A549 cells transfected with miR-374a mimic, miR-347b mimic, or negative control in triplicate using an AccuPrep Universal RNA Extraction kit (BIONEER). NGS-based RNA sequencing for global mRNA transcriptome profiling was performed as described previously [
44]. Briefly, after assessing the quantity and quality of RNA samples, a total RNA library was constructed using the Illumina TruSeq stranded mRNA sample prep kit (Illumina, San Diego, CA, USA). Indexed libraries were then submitted to Illumina NovaSeq (Illumina), and paired-end (2 × 100 bp) sequencing was performed by Macrogen (Seoul, Korea). Octopus-toolkit [
45] was used to analyze RNA-sequencing data.