FibroDB: Expression Analysis of Protein-Coding and Long Non-Coding RNA Genes in Fibrosis

Most long non-coding RNAs (lncRNAs) are expressed at lower levels than protein-coding genes and their expression is often restricted to specific cell types, certain time points during development, and various stress and disease conditions, respectively. To revisit this long-held concept, we focused on fibroblasts, a common cell type in various organs and tissues. Using fibroblasts and changes in their expression profiles during fibrosis as a model system, we show that the overall expression level of lncRNA genes is significantly lower than that of protein-coding genes. Furthermore, we identified lncRNA genes whose expression is upregulated during fibrosis. Using dermal fibroblasts as a model, we performed loss-of-function experiments and show that the knockdown of the lncRNAs LINC00622 and LINC01711 result in gene expression changes associated with cellular and inflammatory responses, respectively. Since there are no lncRNA databases focused on fibroblasts and fibrosis, we built a web application, FibroDB, to further promote functional and mechanistic studies of fibrotic lncRNAs.


Introduction
Fibroblasts are the most common cell type in the connective tissue and are found throughout the mammalian body [1,2]. They synthesize and secrete extracellular matrix (ECM) proteins and collagens to maintain the tissue structure. They can be easily isolated from each tissue by simply explanting a piece of tissue in a cell-culture dish as fibroblasts outgrow from such tissue and adhere to the plastic cell-culture dish [3]. In vitro, the morphology of fibroblasts is distinct and described as large, flat, and spindle-shaped cells [4,5]. Upon injury (e.g., an open-wound injury in the skin, myocardial infarction), fibroblasts are activated (called myofibroblasts), which will result in the proliferation of myofibroblasts as well as the deposition of excessive ECM, leading to progressive tissue scarring, fibrosis, and organ dysfunctions [6][7][8]. The characterization of fibroblasts has been a focus of intensive research for many years [9][10][11][12]. However, the main outstanding issue in the field is that there is no single gene/protein marker that can describe fibroblasts since they are a collective term to describe heterogeneous populations of cells [13][14][15].
The use of RNA sequencing (RNA-seq, including single-cell RNA-seq (scRNA-seq)) technology has identified a large number of non-protein-coding (ncRNA) genes. When the

Tissue-Specific Expressions of Protein-Coding and lncRNA Genes in Human Fibroblasts
The heterogeneity of fibroblasts has been discussed for many years, and there are currently no cell surface markers that identify all types of fibroblasts [23]. To examine the tissue specificity of fibroblasts, expression profiling was performed using microarrays [24][25][26] and RNA-seq [27]. Because microarrays are not optimal for lncRNA expression profiling as most of the microarray platforms are designed only for protein-coding genes, an RNA-seq dataset on fibroblasts from the abdomen, upper gingiva, lung, soft palate, scalp, trachea, and vocal fold was chosen for further analysis (GEO accession number, GSE140523; Ref. [27]). The objective of the original study was to examine differentially expressed genes in vocal-fold fibroblasts compared to fibroblasts from other anatomical locations. Vocal-fold fibroblasts were chosen because they are located in a more stressful tissue environment than those from other body parts, leading to differences in gene expression triggered by mechanical stress in the vocal fold [27]. In the original study, only protein-coding genes were examined, which calls for further analysis of this dataset. When this dataset was analyzed using the latest annotation file provided by the Ensembl database [28] (GRCh38.103), 2405 protein-coding genes (out of 19,796 protein-coding genes without readthrough transcripts registered under the GRCh38.103 annotation file) and 8133 lncRNA genes (out of 16,593 lncRNA genes without readthrough transcripts) were found to be expressed in at least one fibroblast cell line based on the normalized values, counts per million (CPM > 0; Supplementary Table S1). Although more than 3.4-fold lncRNA genes in the number of genes are expressed compared to protein-coding genes, the expression of lncRNA genes is significantly lower than those of protein-coding genes ( Figure 1A). A previous study showed that lncRNA genes tend to be shorter in length than protein-coding genes [29]. However, lncRNA genes are lowly expressed compared to protein-coding genes even when normalized for their length based on two measurements: Reads per kilobase of transcript per million mapped reads (RPKM) and transcripts per kilobase million (TPM) values (Supplementary Figure S1). Due to the ongoing debate on the normalization of RNA-seq data and expression values [30], CPM values are used in this study to avoid potential issues related to alternative splicing and isoform length.  When the top 20 highly expressed protein-coding ( Figure 1B) and lncRNA ( Figure 1C) genes are compared, the expression levels of highly expressed lncRNA genes are 40-fold lower than those of protein-coding genes. The group of highly expressed protein-coding genes comprises collagen (COL1A1, COL1A2, COL3A1, and COL6A3) and structural proteins (ACTB, ACTG1, DCN, FN1, ITGB1, THBS1, and VIM) as expected. Since the 34 fibroblast cell lines were derived from human cadavers of both sexes and various ages [27], XIST (X inactive specific transcript), the master regulator of X chromosome inactivation, is found among the top 20 highly expressed lncRNA genes. Of note, both protein-coding and lncRNA genes are expressed in a cell-line specific manner ( Figure 1B,C-heatmaps), although the same procedures were used for tissue explant and culture methodology [27]. This suggests that it will be difficult to identify fibroblast-specific genes simply based on the gene expression.

Gene Expression Changes of Protein-Coding and lncRNA Genes during Pulmonary Fibrosis
Because of constant environmental threats to the lungs (e.g., dusts, viruses) [31,32], fibroblasts in the lungs are exposed to various stress conditions, which are activated to sustain the remodeling of lung tissue upon injury [33]. When this balance goes awry, pulmonary fibrosis occurs, where fibroblasts play an important role [34]. Thus, pulmonary fibroblasts and their gene expression changes are increasingly investigated, especially in the context of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection [35], which provides a number of RNA-seq data available to be analyzed for lncRNA genes. For example, chronic thromboembolic pulmonary hypertension (CTEPH) is a progressive form of pulmonary hypertension caused by a blockage in the blood vessels in the lungs, leading to pulmonary fibrosis [36,37]. A recent study provided RNA-seq data from fibroblasts isolated from pairs of CTEPH thrombus and pulmonary artery adventitia of the same CTEPH patient (four CTEPH patients in total; GEO accession number, GSE149413; [38]). This study reported transforming growth factor-β (TGF-β) as the central regulator of fibrotic thrombus remodeling via neurophil-mediated inflammation [38]. When a secondary analysis of this RNA-seq dataset was performed, there were 913 up-and 712 down-regulated genes in the thrombus compared to the pulmonary artery adventitia (control) of CTEPH patients at the threshold of 2-fold and p < 0.05 (Figure 2A; Supplementary Table S2). Of these, 623 up-and 414 down-regulated genes are protein-coding genes, while 221 up-and 221 down-regulated genes encode lncRNAs. However, the expression levels of lncRNA genes are generally lower than those of protein-coding genes in all samples ( Figure 2B) as in fibroblasts from other anatomical locations ( Figure 1). TGF-β regulates the growth and differentiation of cells, including fibroblasts [39,40]. In the lungs, TGF-β is a major driver for the development of pulmonary fibrosis [41,42]. To understand the TGF-β-stimulated fibrosis process, a recent study carried out an RNA-seq of the MRC5 lung fibroblastic cell line treated with TGF-β [43]. The authors identified and characterized one lncRNA, DNM3OS (DNM3 opposite strand/antisense RNA), from this screening. However, detailed profiling of other lncRNA genes was not provided in this study [43]. To examine the expression changes of lncRNA genes upon the TGFβ stimulation, a secondary analysis of the published RNA-seq dataset (GEO accession number, GSE97829) was performed. We uncovered 1153 up-and 1316 down-regulated protein-coding genes upon the TGF-β treatment, while there are 385 up-and 523 downregulated lncRNA genes ( Figure 3A; Supplementary Table S3). Not surprisingly, lncRNA genes are expressed at a much lower level than protein-coding genes ( Figure 3B), consistent with our observations in the previous datasets (Figures 1 and 2).
When the two datasets from pulmonary fibroblasts above were compared, there were 85 up-regulated protein-coding and 24 lncRNA genes as well as 76 down-regulated proteincoding and 25 lncRNA genes shared between the two datasets: Fibroblasts from CTEPH patients and MRC5 cells treated with TGF-β ( Figure 3C; Supplementary Table S4). Gene ontology (GO) analysis of shared differentially expressed protein-coding genes showed that the up-regulated protein-coding genes involved in cell adhesion (GO:0007155) and extracellular matrix organization (GO:0030198) are enriched, which are characteristics of activated fibroblasts, myofibroblasts ( Figure 3D). Furthermore, the shared down-regulated protein-coding genes include the GO term for the Wnt signaling pathway (GO:0016055), such as FRZB (frizzled related protein), an extracellular antagonist of Wnt signaling that is known to negatively control the fibrosis in vitro [44]. In summary, the finding of proteincoding and lncRNA genes shared by the two datasets suggest that they are at least in part under the control of TGF-β signaling. When the two datasets from pulmonary fibroblasts above were compared, there were 85 up-regulated protein-coding and 24 lncRNA genes as well as 76 down-regulated protein-coding and 25 lncRNA genes shared between the two datasets: Fibroblasts from CTEPH patients and MRC5 cells treated with TGF-β ( Figure 3C; Supplementary Table S4). Gene ontology (GO) analysis of shared differentially expressed protein-coding genes showed that the up-regulated protein-coding genes involved in cell adhesion (GO:0007155) and extracellular matrix organization (GO:0030198) are enriched, which are characteristics of activated fibroblasts, myofibroblasts ( Figure 3D). Furthermore, the shared down-regulated protein-coding genes include the GO term for the Wnt signaling pathway (GO:0016055), such as FRZB (frizzled related protein), an extracellular antagonist of Wnt signaling that is known to negatively control the fibrosis in vitro [44]. In summary, the finding of protein-coding and lncRNA genes shared by the two datasets suggest that they are at least in part under the control of TGF-β signaling.

Characterization of TGF-β-Stimulated lncRNAs in Cardiac Fibroblasts
To further characterize the shared up-regulated 24 lncRNAs, another set of RNAseq data of TGF-β stimulation were analyzed (GEO accession number, GSE123018; [45]). Although this dataset is from cardiac fibroblasts, the use of fibroblasts from two anatomical sources (i.e., lungs and hearts) allows the examination of TGF-β-stimulated lncRNAs in fibroblasts in general. Analysis of RNA-seq data from TGF-β-stimulated cardiac fibroblasts indicates that a limited number of differentially expressed genes (both protein-coding and lncRNA genes) are shared between cardiac and pulmonary fibroblasts, respectively  Table S5). Among the up-regulated lncRNAs, there are only three lncRNA genes shared between cardiac and pulmonary fibroblasts based on threshold values of 2-fold and FDR-adjusted p < 0.05: AL121749.2, LINC00622, and LINC01711 ( Figure 4B). data of TGF-β stimulation were analyzed (GEO accession number, GSE123018; [45]). Although this dataset is from cardiac fibroblasts, the use of fibroblasts from two anatomical sources (i.e., lungs and hearts) allows the examination of TGF-β-stimulated lncRNAs in fibroblasts in general. Analysis of RNA-seq data from TGF-β-stimulated cardiac fibroblasts indicates that a limited number of differentially expressed genes (both protein-coding and lncRNA genes) are shared between cardiac and pulmonary fibroblasts, respectively ( Figure 4A; Supplementary Table S5). Among the up-regulated lncRNAs, there are only three lncRNA genes shared between cardiac and pulmonary fibroblasts based on threshold values of 2-fold and FDR-adjusted p < 0.05: AL121749.2, LINC00622, and LINC01711 ( Figure 4B). The lncRNA AL121749.2 (Ensembl Gene ID, ENSG00000287528) is a 1350-nt long lncRNA with four exons, located on the forward strand of chromosome 10 (chr10: 35,641,097-35,647,826). In the Ensembl database, the official description of this lncRNA gene is "novel transcript, antisense to FZD8". FZD8 (frizzled class receptor 8) is a member of the frizzled gene family that encodes receptors for the Wnt and other signaling pathways [46]. A previous study showed that TGF-β induced fibroblast activation In vitro (MRC5 cells and primary human lung fibroblasts) and bleomycin-induced fibrosis in mice (FZD8-deficient mice) is regulated via FZD8 with WNT5B as the ligand [47], suggesting that the lncRNA AL121749.2 might be a novel lncRNA implicated in this mechanism.
The lncRNA LINC00622 (Ensembl Gene ID, ENSG00000260941; long intergenic nonprotein coding RNA 622) is a single exonic lncRNA (1570 nt in length) located upstream (chr1: 119,597,702-119,599,271) of the processed pseudogene, GAPDHP33 (glyceraldehyde 3 phosphate dehydrogenase pseudogene 33). A recent study reported that LINC00622 is found in extracellular vesicles (EVs) of adipose-derived stem cells [48]. Knockdown of LINC00622 in EVs reduced tumor growth in nude mice injected with The lncRNA AL121749.2 (Ensembl Gene ID, ENSG00000287528) is a 1350-nt long lncRNA with four exons, located on the forward strand of chromosome 10 (chr10: 35,641,097-35,647,826). In the Ensembl database, the official description of this lncRNA gene is "novel transcript, antisense to FZD8". FZD8 (frizzled class receptor 8) is a member of the frizzled gene family that encodes receptors for the Wnt and other signaling pathways [46]. A previous study showed that TGF-β induced fibroblast activation In vitro (MRC5 cells and primary human lung fibroblasts) and bleomycin-induced fibrosis in mice (FZD8-deficient mice) is regulated via FZD8 with WNT5B as the ligand [47], suggesting that the lncRNA AL121749.2 might be a novel lncRNA implicated in this mechanism.
The lncRNA LINC00622 (Ensembl Gene ID, ENSG00000260941; long intergenic nonprotein coding RNA 622) is a single exonic lncRNA (1570 nt in length) located upstream (chr1: 119,597,702-119,599,271) of the processed pseudogene, GAPDHP33 (glyceraldehyde 3 phosphate dehydrogenase pseudogene 33). A recent study reported that LINC00622 is found in extracellular vesicles (EVs) of adipose-derived stem cells [48]. Knockdown of LINC00622 in EVs reduced tumor growth in nude mice injected with neuroblastoma cells possibly by regulating the expression of gamma-aminobutyric acid type A receptor subunit rho1 (GABRR1) via the androgen receptor (AR).
Another study found that LINC01711 is contained in the exosomes of esophageal squamous cell carcinoma (ESCC) [50]. Knockdown of LINC01711 in ESCC cell lines induced apoptosis, inhibited the proliferation, migration, invasion, and cell growth, while the administration of exosome-derived LINC01711 promoted tumor growth in nude mice. Mechanistically, the authors suggested that LINC01711 functions as a microRNA sponge to sequester miR-326, which targets fascin actin-bundling protein 1 (FSCN1). Furthermore, the downstream protein-coding gene, STX16, is shown to interact with the N-terminal region of CFTR (cystic fibrosis transmembrane conductance regulator) in epithelial cells [51], suggesting a possible relation to pulmonary fibrosis. In summary, all three lncRNA genes have one isoform reported in the current annotations by the Ensembl database and have not been studied for their functions in fibroblasts, which calls for detailed mechanistic studies in the context of TGF-β-induced fibrosis.

Loss-of-Function Study of TGF-β-Stimulated lncRNAs in Dermal Fibroblasts
To test whether the above TGF-β-stimulated lncRNA genes have functions during fibrosis, we performed loss-of-function experiments in dermal fibroblasts. The rationale for using fibroblasts derived from yet another anatomical location was to investigate the effects of knockdown of the lncRNA genes in fibroblasts in general. First, the dermal fibroblasts were stimulated with TGF-β1, which resulted in the up-regulation of the major collagen, COL1A1 ( Figure 5A). Next, the expression of three lncRNA genes and two upregulated protein-coding genes was examined. As shown in Figure 5B, LINC00622 and LINC01711 were highly up-regulated upon TGF-β1 stimulation, which confirmed the above selection criteria. Of note, the lncRNA candidate gene, AL121749.2, was not expressed in unstimulated nor in TGF-β1-stimulated dermal fibroblasts.
To uncover the functional importance of specific lncRNAs, it is necessary to perform loss-of-function experiments. When LINC00622 was knocked down using siRNAs and stimulated with TGF-β1 ( Figure 5C), there are 608 up-and 319 down-regulated genes when the threshold values of 2-fold and FDR-adjusted p-values <0.05 were applied to RNA-seq data ( Figure 5D). Among the up-regulated genes, GO terms related to cellular responses (e.g., ions, inflammatory, chemokines) are enriched ( Figure 5E), whereas GO terms related to cell proliferation and morphogenesis are enriched among the down-regulated genes ( Figure 5F). When the differentially expressed genes were analyzed for KEGG pathways, the mitogen-activated protein kinase (MAPK) signaling pathway (hsa04010) ( Figure 5G,H), which plays an important role in the profibrotic processes in various diseases [52][53][54][55], was enriched. These data suggest that LINC00622 is important in the regulation of the TGF-β1 stimulated fibrosis.
Similarly, when the lncRNA, LINC01711, was knocked down using siRNAs and upon stimulation with TGF-β1 ( Figure 6A), there were 529 up-and 330 down-regulated genes when the threshold values of 2-fold and FDR-adjusted p-values < 0.05 were applied to RNA-seq data ( Figure 6B). Among the up-regulated genes, the inflammatory response, especially viral related, and the associated processes, including cytokine signaling pathways, are enriched in GO analysis ( Figure 6C), whereas GO terms associated with the extracellular matrix and collagen fibril organizations are enriched in the down-regulated genes ( Figure 6D). When the differentially expressed genes were analyzed for KEGG pathways, the cytokine-cytokine receptor interaction (hsa04060) and the TNF signaling pathway (hsa04668) were enriched ( Figure 6E,F), suggesting that LINC01711 is involved in regulating inflammatory responses of fibroblasts. pathway (hsa04668) were enriched ( Figure 6E,F), suggesting that LINC01711 is involved in regulating inflammatory responses of fibroblasts.

The Resource for Protein-Coding and lncRNA Genes in Fibrosis: FibroDB
As a large number of RNA-seq data were analyzed in this study, focu

The Resource for Protein-Coding and lncRNA Genes in Fibrosis: FibroDB
As a large number of RNA-seq data were analyzed in this study, focusing especially on lncRNA genes, we have built a web application, FibroDB, to disseminate the obtained information ( Figure 7A). FibroDB is an easy-to-use web application that allows end-users to explore expression changes of protein-coding and lncRNA genes using the three mostcommonly used normalized expression values (CPM, RPKM, and TPM) ( Figure 7B). For each gene, the hyperlink is provided to GeneCards [56] for further information, while each study is linked to the data information provided by the GEO [57]. To visualize comparison of each GEO study included in FibroDB, volcano plots are generated for a specified comparison of two experimental conditions ( Figure 7C), allowing users to obtain a global image of the study being analyzed. Furthermore, differentially expressed genes can be compared and analyzed further via visual inspection by heatmaps ( Figure 7D) and based on Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. FibroDB also generates a Venn diagram to compare differentially expressed genes among studies registered in FibroDB ( Figure 7E). All data included in FibroDB are downloadable from the Download tab to allow further understanding of gene expression changes in fibroblasts and during fibrosis. comparison of two experimental conditions ( Figure 7C), allowing users to obtain a globa image of the study being analyzed. Furthermore, differentially expressed genes can be compared and analyzed further via visual inspection by heatmaps ( Figure 7D) and based on Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. FibroDB also generate a Venn diagram to compare differentially expressed genes among studies registered in FibroDB ( Figure 7E). All data included in FibroDB are downloadable from the Download tab to allow further understanding of gene expression changes in fibroblasts and during fibrosis.

Discussion
In this study, we find that (i) a large number of lncRNA genes are expressed in fibroblasts and during fibrosis; (ii) compared to protein-coding genes, the overall expression levels of lncRNA genes are much lower in fibroblasts and TGF-β-stimulated fibroblasts; (iii) although TGF-β stimulation is a common mechanism in fibrosis, only very few proteincoding and lncRNA genes share similar profiles between cardiac and pulmonary fibroblasts; and (iv) knockdown of the lncRNAs, LINC00622 and LINC01711, resulted in gene expression changes associated with cellular and inflammatory responses, respectively. However, further functional and mechanistic studies are required to understand the importance of these lncRNAs in fibrosis.
As with any other study, there are limitations to our study. First, all the RNA-seq data analyzed are of poly A-enriched sequencing, although all RNA-seq data are strandspecific sequencing. Thus, it is possible that lncRNAs without poly A tails may have higher expression than those with poly A tails. Second, we only focused on the known lncRNA genes based on the latest annotation provided by the Ensembl database. Thus, it is possible that novel lncRNA genes might have higher expression than those of proteincoding genes. Third, only one time point after the stimulation with TGF-β was investigated. More mechanistic studies are needed with longer stimulation to understand the impact of silencing the candidate lncRNAs identified in this study.
Although it is now common to employ scRNA-seq to assess the heterogeneity of cells [11,[58][59][60], such an approach is not suitable for studying the functions of lncRNAs, as it will be difficult to identify minor populations of cells with low expression of lncRNA genes as shown in this study. Thus, we intentionally excluded scRNA-seq data from FibroDB. Since the interest to study fibrosis has increased in recent years, the commands and snakemake [61] pipelines are available via the GitHub repository to allow further analysis of similar RNA-seq data.
There are several databases currently available that include expression profiles of lncRNAs. Most of these databases include the expression profiles derived from RNA-seq data of whole tissues (normal and/or tumors as in the case of C-It-Loci [62], LncBook [63], lncRNAtor [64], LncExpDB [65], RefLnc [66]) and cell lines (LncExpDB [65], lncRNAtor [64], RefLnc [66]), which is not ideal to understand the expression profiles of a certain cell type, especially in normal physiological conditions. To solve this problem, two databases that focus on a specific cell type is available: ANGIOGENES for endothelial cells [67] and RenalDB for cells in kidneys [68]. To the best of our knowledge, FibroDB is the first lncRNA database focused specifically for fibroblasts and during fibrosis. Our FibroDB web application allows the users to quickly search for lncRNAs differentially expressed in several experimental conditions. Furthermore, comparisons among different experimental settings can be carried out to narrow down the list of differentially expressed lncRNAs during fibrosis in different tissues.

RNA-Seq Data Analysis
RNA-seq data were downloaded from the Sequence Read Archive (SRA) database using SRA Toolkit [69]. The data-sets used in this study are indicated in the Results section with the corresponding accession numbers from the Gene Expression Omnibus (GEO) database. FASTQ files were preprocessed with fastp [70] (versions 0.21.0 and 0.22.0) using default settings to perform quality control, trimming of adapters, filtering by quality, and read pruning. After the preprocessing of sequencing reads, STAR [71] (versions 020201 and 2.7.9a) was used to map the reads to the reference genome (GRCh38.103). To calculate counts per million (CPM) values and derive differentially expressed genes, the R package, edgeR [72] (versions 3.30.3 and 3.32.1), was used. False discovery rate (FDR)-adjusted p-values were used for further analysis, unless stated otherwise in the text. To derive reads per kilobase of transcript per million mapped reads (RPKM) and transcripts per kilobase million (TPM) values, the R packages, GenomicFeatures [29] and edgeR, were used. The commands and programs used in this study can be found on the GitHub repository (https://github.com/heartlncrna/Analysis_of_FB_Studies).

Data Analysis and Visualization
To plot violin and volcano plots, the R-package, ggplot2 [73], was used after removing genes with zero CPM values. Volcano plots were generated using bioinfokit [74]. To draw heat maps, the MultiExperiment Viewer (MeV) [75] was used. The gene ontology (GO) terms and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were analyzed via the Database for Annotation, Visualization, and Integrated Discovery (DAVID) v6.8 [76,77].
To silence the target lncRNAs, the following MISSION siRNAs (Sigma-Aldrich) were used: To induce fibrosis, cells were serum-starved for one day in MEM supplemented with 1% MEM Non-essential Amino Acid Solution (Sigma-Aldrich, #M7145), 1% L-Glutamine solution (Sigma-Aldrich, #G7513), and 1% Penicillin-Streptomycin (Sigma-Aldrich, #P4333) at 37 • C with 5% CO 2 . Then, the media were changed with the growth medium above, which are designated as the control (unstimulated). To stimulate fibrosis, recombinant human TGF-β1 (HEK293 derived) (Proteintech, Manchester, UK, #100-21) was added to the growth medium at the concentration of 5 ng/mL. In the case of siRNA-based knockdown, siRNA was added to the culture medium at the time of medium change. The samples were collected 24 h after the stimulation with TGF-β1 for immunostaining and isolation of total RNA.

Isolation of Total RNA and RT-PCR
The TRIzol Reagent (Thermo Fisher Scientific, Roskilde, Denmark, #15596018) was used to isolate the total RNA from cells and purified following the manufacturer's protocol. SuperScript IV VILO Master Mix with the ezDNase Enzyme (Thermo Fisher Scientific, #11766500) was used to digest the genomic DNA and reverse transcribe one µg of total RNA for each sample to synthesize the first-strand complementary DNA (cDNA). After reverse transcription, the first-strand cDNA was diluted with DNase/RNase-free water to the concentration of 1 ng/µL. A quantitative reverse transcription polymerase chain reaction (qRT-PCR) reaction was performed with 1 ng of cDNA template per reaction using PowerUp SYBR Green Master Mix (Thermo Fisher Scientific, #A25777) via the QuantStudio 6 Flex Real-Time PCR System (Thermo Fisher Scientific) with the annealing temperature at 60 • C. Relative fold expression was calculated by 2 -DDCt using ribosomal protein lateral stalk subunit P0 (PRLP0) as an internal control. The primer pairs were designed using Primer3 (http://bioinfo.ut.ee/primer3-0.4.0/, 24 December 2021) [78] and in silico validated with the UCSC In-Silico PCR tool (https://genome.ucsc.edu/cgi-bin/hgPcr, 24 December 2021) before extensive testing by the conventional RT-PCR reaction followed by running the PCR product on an agarose gel to examine for a single band of the expected size for each primer pair. The primer sequences are provided in Supplementary Table S6.

Immunocytochemistry
Cells were washed once with ice-cold phosphate-buffered saline (PBS) and fixed with 4% paraformaldehyde (Thermo Fischer Scientific, #28906) in PBS for 10 min. After three washes with ice-cold PBS, cells were permeabilized with 0.1% Triton-X 100 (Sigma-Aldrich, T8787) in PBS for 5 min and washed once with 0.1% Triton-X 100 in PBS. The primary antibody was diluted in PBS with 30% Blocker BSA (Thermo Fisher Scientific, #37525) and 0.1% Triton X-100. The incubation of the primary antibody was conducted at 4 • C for overnight. After three washes with 0.1% Triton-X 100 in PBS, the secondary antibody was diluted in 0.1% Triton-X 100 in PBS to stain at room temperature (RT) for one hour: Goat anti-Rabbit IgG (H + L) Cross-Adsorbed ReadyProbes Secondary Antibody, Alexa Fluor 488 (Thermo Fisher Scientific, #R37116). The stained cells were washed three times with 0.1% Triton-X 100 in PBS. To visualize the nuclei, DAPI (Sigma-Aldrich, #10236276001) staining was performed at room temperature for 5 min. The antibodies used in this study are provided in Supplementary Table S7. The stained cells were visualized with the Invitrogen EVOS FL Digital Inverted Fluorescence Microscope (Thermo Fisher Scientific). The obtained images were merged using the Fiji image processing package [79].

RNA-Seq Experiment
Non-directional RNA-seq was performed at Novogene (Cambridge, UK) using the NEB Next Ultra RNA Library Prep Kit for mRNA-seq library preparation followed by sequencing with the Illumina Novaseq 6000 platform with a PE150 sequencing strategy.

FibroDB Web Application
The FibroDB web application is based on the R package, Shiny [80]. The app provides three primary features: (1) Exploration of results (Explore), (2) downloads, and (3) documentation. The Explore tab displays an interactive table, created with the R package, DT (https://github.com/rstudio/DT), containing differential gene expression results from the study selected by the user. Normalized expression values were plotted via the R packages, ggplot2 [73] and plotly [81]. The table and plot are linked, such that when a user selects a row in the table, the plot changes to reflect the expression of the gene on that row.
The Explore tab also displays the volcano plot for each study for which differential gene expression could be calculated. The plot is visualized using the R package, ggplot2.
At the Explore page, the Heatmap tab displays an expression heatmap plotted via the pheatmap function from the R package, ComplexHeatmap [82]. Pathway enrichment was calculated using the en-richr function from the R package, enrichR [83][84][85][86], to query over-or under-expressed differentially expressed genes (DEGs) against the KEGG pathway database. The results are visualized in the Pathway analysis tab using the R package, ComplexHeatmap [82]. Finally, the over-or under-expressed genes are compared between studies using the venn.diagram function from the R package, VennDiagram [87], and visualized in the Comparison tab.
The processed datasets from this study are hosted on a public AWS S3 bucket. The Downloads page provides instructions and links to access these data along with verbose descriptions. The Documentation page provides instructions for the usage of the application, rendered as HTML via the R package, R Markdown [88]. FibroDB will be updated twice a year to include the latest publicly available RNA-seq datasets after manual search via the GEO database.

Statistics
Data are presented as the mean ± S.E.M. unless otherwise indicated. Two-sample, twotail, heteroscedastic Student's t-test was performed to calculate a p-value via Microsoft Excel.

Conclusions
There are many lncRNA databases currently available [22]. Yet, all these databases catalog a variety of lncRNAs in various tissues, of which some databases are aimed to uncover tissue-specificity and conservation among species [89]. Our FibroDB differs from the existing lncRNA databases by focusing on one cell type (that is, fibroblasts) and their differentiated status, myofibroblasts, which are increasingly being studied in chronic inflammatory diseases. As experimentally demonstrated in this study, part of the contents in FibroDB have been confirmed via expression profiling of fibrotic genes and upon performing loss-of-function experiments for two identified lncRNAs using dermal fibroblasts stimulated with TGF-β1. These validation data will add confidence to the information contained in FibroDB for further biological experiments for end-users to explore the roles of new lncRNAs implicated in fibrosis.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/ncrna8010013/s1, Supplementary Table S1. RNA-seq data of fibroblasts from seven anatomical locales (abdomen, upper gingiva, lung, soft palate, scalp, trachea, and vocal fold; GEO accession number, GSE140523). The average CPM values of each anatomical locale are shown for each gene; Supplementary Table S2. Differentially expressed genes in thrombus compared to pulmonary artery adventitia (control) of CTEPH patients at the threshold of 2-fold (logFC, fold change in logarithm of base 2) and p < 0.05 (PValue) (GEO accession number, GSE149413); Supplementary Table S3. Differentially expressed genes in TGF-β-treated MRC5 lung fibroblastic cell line compared to non-treated MRC5 cells at the threshold of 2-fold (logFC, fold change in logarithm of base 2) and FDR-adjusted p < 0.05 (FDR) (GEO accession number, GSE97829); Supplementary  Table S4. List of differentially expressed genes shared between two datasets: Fibroblasts from CTEPH patients and MRC5 cells treated with TGF-β; Supplementary Table S5. List of differentially expressed genes in pulmonary fibroblasts (CTEPH patients and MRC5 cells treated with TGF-β) and their expression changes in cardiac fibroblasts stimulated with TGF-β for 24 h compared to the CFs before the TGF-β stimulation (baseline). The category column is based on the RNA-seq data of pulmonary fibroblasts. The logFC column shows the logarithm of base 2 of fold change, while the FDR column displays the FDR-adjusted p-values; Supplementary Table S6. List of primers used in this study; Supplementary Table S7. List of antibodies used in this study; Supplementary Figure S1. Gene expression of tissue-specific human fibroblasts normalized to gene lengths. Data Availability Statement: The Supplementary Materials, commands, and programs used in this study can be found in the GitHub repository: https://github.com/heartlncrna/Analysis_of_FB_ Studies). All code used to generate FibroDB is available in the GitHub repository: https://github. com/Bishop-Laboratory/FibroDB.