Identification of Novel Micropeptides Derived from Hepatocellular Carcinoma-Specific Long Noncoding RNA

Identification of cancer-specific target molecules and biomarkers may be useful in the development of novel treatment and immunotherapeutic strategies. We have recently demonstrated that the expression of long noncoding (lnc) RNAs can be cancer-type specific due to abnormal chromatin remodeling and alternative splicing. Furthermore, we identified and determined that the functional small protein C20orf204-189AA encoded by long intergenic noncoding RNA Linc00176 that is expressed predominantly in hepatocellular carcinoma (HCC), enhances transcription of ribosomal RNAs and supports growth of HCC. In this study we combined RNA-sequencing and polysome profiling to identify novel micropeptides that originate from HCC-specific lncRNAs. We identified nine lncRNAs that are expressed exclusively in HCC cells but not in the liver or other normal tissues. Here, DNase-sequencing data revealed that the altered chromatin structure plays a key role in the HCC-specific expression of lncRNAs. Three out of nine HCC-specific lncRNAs contain at least one open reading frame (ORF) longer than 50 amino acid (aa) and enriched in the polysome fraction, suggesting that they are translated. We generated a peptide specific antibody to characterize one candidate, NONHSAT013026.2/Linc013026. We show that Linc013026 encodes a 68 amino acid micropeptide that is mainly localized at the perinuclear region. Linc013026-68AA is expressed in a subset of HCC cells and plays a role in cell proliferation, suggesting that Linc013026-68AA may be used as a HCC-specific target molecule. Our finding also sheds light on the role of the previously ignored ’dark proteome’, that originates from noncoding regions in the maintenance of cancer.


Introduction
Hepatocellular carcinoma (HCC) is one of the most prevalent tumor types worldwide [1]; however, current treatment options are limited, and precise and effective medical strategies for therapy do not exist [2]. HCC typically occurs on a background of chronic liver disease, with risk factors including viral or autoimmune hepatitis, chronic alcohol abuse, and nonalcoholic fatty liver disease [3]. These risk factors trigger aberrant liver regeneration, which initiates the formation of HCC. However, the underlying molecular mechanism is still largely unknown. It has been recently shown by exome sequencing of HCC that 161 putative driver genes are associated with 11 recurrently altered pathways in HCC development, suggesting that many signaling pathways are altered to a modest degree, and act together [4][5][6]. Notably, 28% of altered gene products are involved in a chromatin-remodeling complex, suggesting that HCC expresses unique genes that are not 2 of 14 expressed in normal hepatocytes. In this context, we have previously shown that a subset of lncRNAs that are predominantly expressed in HCC plays a role as fine tuners in cancer formation and/or maintenance [7,8]. Thus, a potential strategy for cancer therapy may be to target multiple cancer type-specific fine tuners including noncoding RNA.
Traditional annotation of protein-encoding genes relied on assumptions, such as one open reading frame (ORF) encodes one protein and minimal lengths for translated proteins [9]. However, recent data from our laboratory and from others have revealed that RNAs previously considered noncoding, such as long noncoding RNAs (lncRNAs) and circular RNAs are translated into functional small proteins [10][11][12][13], suggesting that the proteome is more complex than previously anticipated.
In the present study we utilized RNA sequencing (RNA-seq) and polysome profiling to identify novel micropeptides that originate from HCC-specific lncRNAs. We identified two HCC-specific lncRNAs that are translated into small ORFs. Applying a peptide specific antibody we characterized one lncRNA candidate, NONHSAT013026.2/Linc013026-68AA. Linc013026-68AA is translated into a 68 amino acid micropeptide that is mainly localized at the perinuclear region. Notably, Linc013026-68AA is predominantly expressed in moderately-but not well-differentiated HCC cells and plays a role in cell proliferation, suggesting that Linc013026-68AA may be used as a HCC-specific target molecule. Our data also uncover the important role of previously ignored small ORFs originating from noncoding regions in the maintenance of cancer.

Identification of Hepatocellular Carcinoma-Specific lncRNAs
To identify lncRNAs that are expressed in HCC cells but not in normal hepatocytes, we used publicly available RNA-sequencing (RNA-seq) data generated by the ENCODE Consortium [14] to extract the expression level of lncRNAs. Firstly, we mapped RNA-seq data from normal liver (ENCFF184YUO) and from the HCC cell line HepG2 (ENCFF337WTM) to long intergenic non-coding RNAs (lincRNA) annotated by NONCODE v5.0, an integrated knowledge database of non-coding RNAs [15] ( Figure 1A). We limited our study to lincRNAs, because RNA-seq based on second generation sequencing limits the accurate allocation of reads if lncRNAs overlap with coding genes. RNA-seq datasets were normalized using cuffnorm [16]. A total of 906 lincRNAs that expressed only in HepG2 cells but not in the liver were selected. To identify HCC-specific lincRNAs we further examined the expression of our lincRNA candidates in normal tissues including adipose, adrenal, brain, breast, colon, foreskin, heart, kidney, lung, ovary, placenta, prostate, skeletal muscle, testis and thyroid tissues and leukocytes using RNA-seq data generated by the Human Body Map 2.0 [17]. Twelve out of 906 lincRNAs were expressed exclusively in HepG2 cells ( Figure 1A, Table S1). We then confirmed the expression of these lincRNAs in HepG2 cells using our previously published RNA-seq data [7]. The expression of nine out of twelve HCC-specific lincRNAs was confirmed ( Figure 1B, HepG2 (GSE115139)).
We next asked why these lincRNAs are expressed exclusively in HepG2 cells but not in the liver. We have previously demonstrated that altered chromatin structure in cancer results in the cancer-specific expression of a subset of genes [7,8]. Thus, we examined the chromatin structure at the putative promotor region of nine HCC-specific lincRNAs using DNase-sequencing data (DNase-seq) generated by the ENCODE Consortium [18]. DNase-seq data ( Figure 1B, DNase-seq) obtained from human hepatocyte (ENCFF851CVH) and HepG2 (ENCFF474LSZ) revealed that HepG2 contains DNase I hypersensitive sites at the proximal promoter region of seven out of nine lincRNAs (except for NONHSAT204527.1 and NONHSAT223630.1 ( Figure 1C)), while normal human liver does not contain these sites at these positions ( Figure 1B, blue arrow), suggesting that the chromatin structure in this region is remodeled in HCC cells. To examine whether these open chromatin regions are associated with cis-regulatory elements, we utilized candidate cis-Regulatory Elements (cCREs) database generated by ENCODE consortium which contain 1,063,878 human cCREs [19]. Notably, putative promoter regions of five out of these seven lncRNA candidates contains at least one cCRE ( Figure 1B, ENCODE cCRE, blue mark). In addition, ChIP-seq of H3K4 trimethylation and H3K27 acetylation, chromatin marks of active transcription revealed that these putative promoter regions of seven lncRNA candidates are transcriptionally active in HepG2 cells ( Figure S2, H3K4me3 and H3K27Ac). These data suggest that transcription may be initiated from these regions. Thus, we utilized the cap analysis of gene expression (CAGE) data in HepG2 cells that mapped the transcription start sites ( Figure 1B, CAGE). In agreement with RNA-seq data, the cap site is located at the open promoter regions determined by DNase-seq data ( Figure 1B, Transcription Start Site (TSS-black arrow)), suggesting that altered chromatin structure plays a key role in the HCC-specific expression of lincRNAs.
In addition to chromatin structure, tissue-specific transcription factors (TFs) are well known to activate tissue-specific expression program [20]. Thus, we next examined which transcription factor potentially activates the transcription of HCC-specific lncRNA genes by utilizing ChIP-seq datasets of 340 factors generated by ENCODE consortium in HepG2 cells. All HCC-specific lncRNA genes are potentially activated by three to seven TFs ( Figure S2). Notably, these TFs are expressed in both normal liver and primary HCC ( Figure S3), suggesting that open chromatin structure at the promoter region rather than a transcription factor may play a key role in the HCC-specific expression of lncRNAs.

Identification of Micropeptide Candidates Derived from HCC-Specific lncRNAs
Six out of nine lincRNAs contain at least one open reading frame (ORF) that is longer than 50 amino acids (AA) ( Table 1). To be translated into micropeptides, lincRNAs have to be exported to the cytoplasm. Thus, we examined the mRNA export of these 6 lincRNA candidates using nuclear-and cytoplasmic RNA-seq generated by the ENCODE Consortium. Except for NONHSAT142412.2 the other five lincRNAs were clearly detected in cytoplasmic RNA-seq (Figure 2A), suggesting that they are exported to the cytoplasm. We also confirmed the mRNA export using RT-PCR ( Figure 2B). To examine whether these five lincRNA candidates are endogenously translated in HepG2 cells, we isolated the polysome fraction of HepG2 cells using sucrose gradient centrifugation [11] and performed qRT-PCR and RT-PCR for five lincRNAs. Actin mRNA was used as a positive control. Three out of five lincRNA candidates were detected in translated fractions of HepG2 cells ( Figures 2C and S1, NONHSAT013026.2, NONHSAT168790.1 and NONHSAT250607.1), suggesting that they are translated.

NONHSAT013026.2/Linc013026-68AA Is Translated into a 68 Amino Acid Long Micropeptide
Among three lincRNA candidates that were translated, NONHSAT013026.2 had the highest degree of enrichment in a translated fraction, thus we further focused on the characterization of this lincRNA which we renamed Linc013026. Linc013026 potentially encodes two ORFs of 52AA and 68AA. Since ORF-68AA has a Kozak consensus sequence, we further focused on this ORF. First, we examined whether Linc013026-68AA is translated into a stable micropeptide using an in vitro transcription/translation assay and overexpression in cells. Linc013026-68AA is predicted to encode a micropeptide of 8 kDa [21]. The in vitro transcription/translation assay with Linc013026-68AA revealed a single band at 8-10 kDa ( Figure 3A, arrow). Furthermore, to examine whether Linc013026-68AA protein is stable in cells, we transfected HeLa cells with C-terminal-GFP-and Myc-tagged Linc013026-68AA. GFP-specific immunoblot revealed a band at~38 kDa for GFP-tagged Linc013026-68AA that corresponds to a molecular mass of 10 kDa for Linc013026-68AA ( Figure 3B, 68AA-GFP), while a band of~15 kDa was observed for Myc-tagged Linc013026-68AA ( Figure 3C). Since some proteins can form a dimer in SDS-PAGE (0.1% SDS) [22], we utilized GST pull down assay to examine the interaction of N-terminal GST -tagged 68AA with C-terminal Myc-tagged 68AA. As shown in Figure 1D, no interaction between GST-68AA and 68AA-Myc was detected. In addition, we also observed a~15 kDa band for 68AA-Myc using cell lysates pre-treated with 1% and 2% SDS under reduced condition ( Figure 3E). These data suggested that Myc-tagged Linc013026-68AA did not form a dimer.
We then examined the peptide sequence of Linc013026-68AA. Linc013026-68AA contains five potential serine, two threonine and one tyrosine phosphorylation sites (Table S3) and one lysine acetylation site (G-AcK) [23]. To clarify whether phosphorylation affects the migration of Linc013026-68AA in SDS PAGE, we treated cell lysates with Lambda Protein Phosphatase (Lambda PP) that dephosphorylates phospho-tyrosine, serine and threonine residues. Upon Lambda PP treatment, we observed two additional bands of~14 and~10 kDa ( Figure 3F (*)), suggesting that phosphorylation contributes to the slower migration of Linc013026-68AA in SDS PAGE.    To examine the endogenous expression of Linc013026-68AA we generated a rabbit antibody against two mixed synthetic peptides corresponding to amino acid positions 4-17 (peptide I) and 54-68 (peptide II) of Linc013026-68AA (Kaneka Eurogentec S.A. Belgium) (amino acid sequences are shown in Figure 3G). First, we tested the specificity of our antibodies. By immunoblot using anti-peptide I and peptide II antibodies, a 38 kDa band for GFP-tagged Linc013026-68AA was specifically detected ( Figure 3H). This band was not detected by peptide absorbed antibody ( Figure 3H, anti-peptide II + peptide II). We then examined the subcellular localization of exogenous and endogenous Linc013026-68AA using immunofluorescent (IF) and immunohistochemical (IHC) staining. HeLa cells were transfected with Myc-tagged Linc013026-68AA and stained using the immunofluorescent technique with anti-Linc013026-68AA and Myc-specific antibodies. Myc-tagged Linc013026-68AA was detected mainly at the perinuclear region by a Myc-specific staining ( Figure 3I). Anti-peptide II but not peptide I antibody gave a strong IF staining signal that completely overlapped with the Myc-specific signal ( Figure 3I, Merged). Next, we tested Linc013026-68AA antibodies for IHC staining. In agreement with IF staining, anti-peptide II but not peptide I antibody gave a strong signal for Myc-tagged Linc013026-68AA at the perinuclear region ( Figure 3J).   Figure S1. anti-peptide II but not peptide I antibody gave a strong signal for Myc-tagged Linc013026-68AA at the perinuclear region ( Figure 3J). Thus, we used anti-peptide II antibody to examine the endogenous expression of Linc013026-68AA in HepG2 cells. We first depleted Linc013026-68AA using siRNA in HepG2 cells ( Figure 3K) and performed anti-peptide II specific immunoblot. In control cells, three major bands ranging from 10-15 kDa were detected ( Figure 3L, (*)). Upon Linc013026-68AA depletion, the intensity of these bands was reduced. These data suggested that endogenous Linc013026-68AA may also be phosphorylated as observed for exogenous Linc013026-68AA ( Figure 3F). We also tested the endogenous expression of Linc013026-68AA using immunohistochemical staining with anti-peptide II antibody. In control cells, endogenous Linc013026-68AA was detected mainly at the perinuclear region ( Figure 3M, siCr) which agreed with the subcellular localization of exogenous Linc013026-68AA. Upon Linc013026-68AA depletion, staining signals of Linc013026-68AA were clearly reduced ( Figure 3M, si68AA). In sum, immunoblotting and IHC staining suggested that Linc013026-68AA is endogenously translated into 68AA micropeptide.

Linc013026-68AA Enhances Cell Proliferation
Recent data from our lab suggested that protein derived from lncRNA associates with a biological function [11]. Since HeLa cells do not express Linc013026-68AA ( Figure 3I), Myc-tagged Linc013026-68AA in HeLa cells was expressed ( Figure 4A). We next examined whether Linc013026-68AA influences cell growth by crystal violet staining assay and Wst-1 assay. Here, within 2 days, growth of Linc013026-68AA-overexpressing HeLa cells was approximately 1.7-fold by crystal violet assay and 1.6-fold by WST assay greater than control vector transfected HeLa cells ( Figure 4B,C). Furthermore, depletion of Linc013026 RNA in HepG2 cells reduced cell proliferation approximately two-fold within 3 days measured by crystal violet-( Figure 4D) and Wst-1 assay ( Figure 4E), suggesting that Linc013026-68AA promotes cell proliferation. We next examined the Linc013026-68AA transcript in several HCC cell lines, such as HepG2, Hep3B, C3A, Huh7 and HLE. HeLa cells were used as negative control. Linc013026-68AA is expressed in two out of five HCC cell lines ( Figure 4F). Thus, we overexpressed Linc013026-68AA in Huh7 and HLE cells, two HCC cell lines that express Linc013026-68AA at low level. These cells also showed

Discussion
In most human cancers, a large number of proteins with driver mutations are involved in tumor development, implying that multiple fine tuners are involved in cancer formation and/or maintenance. A useful strategy for cancer therapy may therefore be to target multiple cancer-specific fine tuners. In this study, using hepatocellular carcinoma as a system we utilized RNA-seq and polysome profiling to identify novel micropeptides derived from cancer-specific lncRNAs. We identified nine lincRNAs that are exclusively expressed in HCC cells but not in normal liver and other tissues (Table S1). Three out of nine lincRNAs encode small ORFs longer than 50 amino acids and are enriched in poly-

Discussion
In most human cancers, a large number of proteins with driver mutations are involved in tumor development, implying that multiple fine tuners are involved in cancer formation and/or maintenance. A useful strategy for cancer therapy may therefore be to target multiple cancer-specific fine tuners. In this study, using hepatocellular carcinoma as a system we utilized RNA-seq and polysome profiling to identify novel micropeptides derived from cancer-specific lncRNAs. We identified nine lincRNAs that are exclusively expressed in HCC cells but not in normal liver and other tissues (Table S1). Three out of nine lincRNAs encode small ORFs longer than 50 amino acids and are enriched in polysome fractions ( Figure 2C), suggesting that they are translated in a cancer-specific manner. Using a peptide specific antibody we characterized NONHSAT013026.2/Linc013026-68AA, one of our candidates. We show that Linc013026-68AA encodes a 68 amino acid micropeptide that is mainly localized at the perinuclear region ( Figure 3I,J,M). Linc013026-68AA is expressed in a subset of HCC cells and plays a role in cell proliferation (Figure 4). We are currently performing interactome analysis of Linc013026-68AA to gain insights into molecular mechanism(s) of Linc013026-68AA. It has been shown that a micropeptide is involved in muscle performance [12] and growth [13]. In addition, SPAR polypeptides encoded by the Linc00961 regulate mTORC1 and muscle regeneration [24], and another micropeptide, mitoregulin, is involved in protein complex assembly in mitochondria [25]. Recently, we demonstrated that C20orf204-189AA encoded by a lincRNA, Linc00176 stabilizes nucleolin and promotes ribosomal RNA transcription [11]. These findings shed light on the role of the previously ignored 'dark proteome' in the maintenance of cancer. Thus, further characterization of the coding potency of other cancer-specific lincRNAs (Table 1) may provide clues for identification of novel cancer-specific fine tuners. Furthermore, micropeptides encoded by cancer-specific lncRNAs may also be useful biomarkers for cancer diagnosis.
Why is the expression of a subset of lncRNAs cancer-specific? Recent data identified 161 putative driver genes that are associated with 11 recurrently altered pathways in HCC development [4], and these mutations were not observed in chronic hepatitis or cirrhosis (preneoplastic stages). Interestingly, 28% of the altered gene products play a role in chromatin remodeling, suggesting that abnormal chromatin remodeling results in a cancer-specific expression of a subset of genes [7,8,26]. Indeed, DNase-seq data which map the chromatin accessibility revealed that chromatin at the putative promotor region of seven out of nine HCC-specific lincRNAs is opened in HCC but not in normal liver ( Figure 1B). Accessible promoters then enable the recruitment of transcription factors which subsequently activate the transcription in these genes. These data also suggest that cancer cells exhibit remarkable transcriptome alterations, partly by adopting cancer-specific chromatin remodeling events.
One of limitations of this study is the lack of clinical data of HCC-specific lncRNA candidates. Examining the expression of these lncRNAs in RNA-seq data of primary HCC generated by The Cancer Genome Atlas (TCGA) or The International Cancer Genome Consortium (ICGC) will provide clues whether they could be a potentially suitable HCCspecific biomarker. However, retrieving expression from open-access data resource requires the gene annotation by GENCODE [27], while many NONCODE lncRNA genes including lncRNA candidates identified in this study are not yet annotated by GENCODE [28]. Thus it is currently not possible to retrieve expression of our lncRNA candidates from open-access data resource. We are currently examining the protein expression of Linc013026-68AA in primary HCC samples and tumor adjacent normal liver tissues to determine whether it can be a potential HCC biomarker. Furthermore, the role of Linc013026-68AA in in vivo tumor growth should also be examined to clarify whether it may be suitable as a HCC-specific target molecule.
Our study offers novel target molecules as well as biomarkers originating from noncoding RNAs to develop a novel strategy for cancer treatment that targets multiple cancer type-specific fine tuners.

Cell Culture, siRNA, and Transfection
HepG2, Huh7, HLE, C3A and HeLa cells were purchased from the American Type Culture Collection (ATCC, Manassas, VA, USA) or the DMSZ-German collection of microor-ganisms and cell culture (DMSZ, Braunschweig, Germany). They were grown in DMEM supplemented with 10% FCS. All cell lines are free of mycoplasma contamination.

Wst-1 Assay
HeLa cells (500-2000 cells/well) were seeded in duplicate on a 96-well plate and then transfected with vector control and Linc013026-68AA and incubated for 2 days. A Wst-1 proliferation assay kit (Roche Diagnostics, Basel, Switzerland) was employed according to the manufacturer's instructions.

Crystal Violet Assay
HeLa, HepG2, Huh7 and HLE cells (500-2000 cells/well) were seeded in duplicate on a 96-well plate and then transfected with vector control, Linc013026-68AA or siRNAs and incubated for 2 days. Cells were then washed with phosphate-buffered saline (PBS) and fixated with methanol. Crystal violet dye was applied for 10 min. After air drying the plate, the dye was solubilized in methanol and absorbance was measured at 595 nm.

mRNA Export Assay
Isolation of nuclear-and cytoplasmic RNA was performed as previously described [30,31]. Briefly, cells were washed with ice-cold PBS three times and incubated in cytoplasmic buffer (100 mm Tris-HCl pH 8.0, 150 mm NaCl, 0.5% (v/v) NP-40, protease inhibitor cocktail [Sigma-Aldrich, St. Louis, MO, USA]) and RNase inhibitor (NEB, Ipswich, MA, USA) for 5 min on ice. Cells were then harvested. Nucleus were pelleted by centrifugation. Nuclearand cytoplasmic RNAs were isolated using the ReliaPrep TM miRNA cell and tissue miniprep system (Promega, Madison, WI, USA) according to the manufacturer's instructions.

Semi-Quantitative RT-PCR and qRT-PCR Analysis
RNA was isolated from cells with the ReliaPrep TM miRNA cell and tissue miniprep system (Promega, Madison, WI, USA) according to the manufacturer's instructions. One microgram of RNA was reverse-transcribed using oligo dT primer or random primer and the ProtoScript ® II Reverse Transcriptase (NEB, Ipswich, MA, USA) following the instructions provided. One-twentieth of the cDNA mix was used for real-time PCR with 10 pmol of forward and reverse primer and ORA TM qPCR Green Rox kit (HighQu, Kraichtal, Germany) in a Qiagen Rotorgene machine. The levels of mRNA expression were standardized to the glyceraldehyde-3 phosphate dehydrogenase (GAPDH) mRNA level. Primer sequences are shown in Table S2.

Statistical Analysis
Cell experiments were performed in triplicate and a minimum of three independent experiments were evaluated. Data were reported as the mean value with standard deviation. The statistical significance of the difference between groups was determined by Student's t-test (two sided).