Comparison of Oxidative and Hypoxic Stress Responsive 2 Genes from Meta-Analysis of Public Transcriptomes 3

Analysis of RNA-sequencing (RNA-seq) data is an effective means to analyze the gene 8 expression levels under specific conditions and discover new biological knowledge. More than 9 74000 experimental series with RNA-seq have been stored in public databases as of October 20, 2021. 10 Since this huge amount of expression data accumulated from past studies is a promising source of 11 new biological insights, we focused on a meta-analysis of 1783 runs of RNA-seq data under the 12 conditions of two types of stresses: oxidative stress (OS) and hypoxia. The collected RNA-seq data 13 of OS were organized as the OS dataset to retrieve and analyze differentially expressed genes 14 (DEGs). The OS-induced DEGs were compared with the hypoxia-induced DEGs retrieved from a 15 previous study. The results from the meta-analysis of OS transcriptomes revealed two genes, CRIP1 16 and CRIP3, which were particularly downregulated, suggesting a relationship between OS and zinc 17 homeostasis. The comparison between meta-analysis of OS and hypoxia showed that several genes 18 were differentially expressed under both stress conditions, and it was inferred that the downregu19 lation of cell cycle-related genes is a mutual biological process in both OS and hypoxia. 20


Introduction
Oxidative stress (OS) is characterized by an imbalance between oxidants and antiox- 24 idants, caused by an increase in the levels of reactive oxygen species (ROS) in a biological 25 system. ROS comprise free radicals that can damage cellular molecules and disrupt ho-26 meostasis when antioxidants are downregulated, or ROS levels are upregulated. Chronic 27 OS has been observed in various diseases such as Parkinson's disease, hepatitis, and can-28 cer [1 -5]. 29 Due to its strong relationship with human health, the mechanisms of OS have been 30 extensively investigated to provide biological and medical knowledge. These include a 31 the mechanism of DNA damage by the highly reactive hydroxyl radicals, the role of OS 32 in carcinogenesis appearance, and the increase in OS-inducible inflammatory cells by ac-33 tivation of specific transcription factors such as NF-E2 related factor-2 (NRF2) [6,7]. The 34 past studies also have resulted in 435 genes in Homo sapiens annotated with the term 35 "GO:0006979 response to oxidative stress" in gene ontology (GO). On the other hand, the 36 broadness of OS-inducible factors and the dynamics of ROS in biological systems make 37 the OS studies challenging and complicated [8]. In spite of attempts to list up and catego-38 rize the OS-related compounds, contributing factors for OS involve enormous range of 39 both external and internal sources [1] and distinguishing oxidative and non-oxidative 40 sources is challenging. Therefore, the present study focused on analyzing the common 41 feature among various sources of OS from the perspective of changes in gene expression. 42 As for another underdeveloped area of OS studies, a clear line between other types of 43 stresses and OS has not been defined. It necessary to compare the OS and other stresses 44 such as hypoxia, which is also an oxygen-related stress condition.

45
Hypoxia is characterized by reduced oxygen availability in tissues and is known to 46 increase ROS levels through changes in signaling cascades and protein expression [9]. A 47 previous study has successfully attained the collective intelligence of public hypoxic tran-48 scriptomes by analyzing 944 runs of RNA-seq data [10]. This approach, a statistical anal-49 ysis of combined results from multiple studies, called meta-analysis has attracted atten-50 tions. It is because the data-driven nature of meta-analysis makes it possible to discover 51 new findings that are difficult to achieve with traditional hypothesis-driven research 52 methods [11]. The dataset and results obtained in the meta-analysis of hypoxia are valua-53 ble sources for both of hypothesis-and data-driven research.

54
To discover novel areas by utilizing valuable open sources, we collected OS tran-55 scriptomes of human cultured cells from public databases and performed a meta-analysis. 56 This study aimed to investigate the key genes and characteristics for not only specifically 57 OS, but also comparison between OS and hypoxia, by analyzing the differentially ex-58 pressed genes (DEGs) from the meta-analysis of both of OS and hypoxia, based on 1783 59 RNA-seq data (839 from this study and 944 from our previous study of meta-analysis in 60 hypoxia [10]). These investigated genes, the OS curated dataset for OS, and the method 61 described in this study to compare the results of multiple meta-analyses are expected to 62 be valuable sources for promoting future studies.

65
As a first step to access and view the integrated expression metadata from public 66 databases, we initially used a graphical web tool, All Of gene Expression (AOE) [12]. AOE 67 provides integrated information about gene expression data integrated from Gene Expres-68 sion Omnibus (GEO) [13], ArrayExpress [14], Genomic Expression Archive [15], and 69 RNA-seq data only archived in the Sequence Read Archive (SRA) [16]. Extensive key-70 words, including "oxidative stress", rotenone, paraquat, hydrogen peroxide (H2O2), UV, 71 lipopolysaccharide, arsenite, and deoxynivalenol, were searched in GEO to collect a list of 72 experiment data series related to the RNA-seq data of OS in humans. Then, we manually 73 curated the adequate data with three main criteria: relation to the definition of oxidative 74 stress, relation to an increase in the ROS level, and availability of the corresponding con-75 trol data (normal state) to pair the OS data. 76 2.2 RNA-seq data retrieval, processing, and quantification 77 We used ikra for RNA-seq data retrieval, processing, and quantification. Ikra is an 78 automated pipeline program for RNA-seq data of Homo sapiens and Mus musculus [17]. 79 Ikra automates the following processes: conversion of the collected SRA format data to 80 FASTQ formatted files using fasterq-dump (version.2.9.6) [18], quality control and trim-81 ming of transcript reads with trim-galore (version 0.6.6) [19], and quantification of the 82 transcripts in a unit of transcripts per million (TPM) by salmon (version 1.4.0) [20] with 83 reference transcript sets in GENCODE Release 31 (GRCh38.p12). changed. When the ON_ratio was greater than the threshold, the gene was considered 93 upregulated, and when the ON_ratio was less than the threshold, the gene was treated 94 as downregulated, otherwise the gene was categorized as unchanged. We adopted 5 and 95 10-fold thresholds for upregulation and 0.2 and 0.1-fold thresholds for downregulation 96 after testing several thresholds.

97
To take all the collected RNA-seq data pairs into account, we calculated an Oxida-98 tive stress-Normal state score (termed as ON_score [11]) based on ON_ratio values using 99 the equation (2): ON_ratio and ON_score were previously introduced in the meta-analysis of OS tran-101 scriptome in insects [11] and the meta-analysis of hypoxic transcriptome [10] (termed as 102 HN-ratio and HN-score in the meta-analysis of hypoxia).

Analysis and comparison of gene sets 104
Differentially expressed gene sets were analyzed by using the web tool, Metascape 105 [21], which performs gene set enrichment analysis. We examined the corresponding terms 106 and p-values obtained using the gene set enrichment analysis. We also used a web Venn 107 diagram tool [22] to search and visualize the matched genes among different gene sets.  110 We collected 839 RNA-seq data and curated them as the OS dataset with 386 pairs of 111 OS and normal state transcriptome data. As OS is caused by various factors, sources of OS 112 in the OS dataset include hydrogen peroxide (H2O2), UV, rotenone, lipopolysaccharide, 113 arsenite, radiation, NRF2 knockdown/KO, BRD4 KO, deoxynivalenol, palmitate, cad-114 mium, methylmercury, zinc dimethyldithiocarbamate, aging, paraquat, and others (Table 115 1). The proportion of the data pairs of hydrogen peroxide, UV, and rotenone against the 116 total 386 pairs was as follows: 25%, 15%, and 12%, respectively. The percentage of the 117 samples derived from cancer cells was 18% (71 pairs out of 386 pairs). Other metadata 118 about the OS dataset such as each SRA project ID, SRR ID, cell type, concentration of treat-119 ment, hours of treatment, and library type of sequencing are shown in figshare [23].

Verifying the Characteristics of DEGs by the OS dataset 129
A schematic view of the analysis is shown in (Figure 1). The most upregulated 493 130 genes and the most downregulated 492 genes, in a total of 985 genes (5% of the total cod-131 ing genes in GENCODE Release 31 (GRCh38.p12)), were retrieved by ON_score 10 as 132 DEGs. We performed gene set enrichment analysis using metascape to visualize the char-133 acteristics of the DEGs. The analysis showed that the 493 most upregulated genes by OS 134 were related to "GO:0009617: response to bacterium" and "M5885: NABA matrisome as-135 sociated" (Figure 2a). The 492 most downregulated genes by OS were related to 136 "GO:0000280 nuclear division" and "R-HAS-69278: Cell Cycle, Mitotic" (Figure 2b). We 137 then found that 32 out of 985 genes were common to genes annotated with GO:0006979 138 (response to oxidative stress). The most upregulated genes common with GO annotation 139 were IL6, PTGS2, and MMP3, and the most downregulated genes common with GO an-140 notation were CDK1, SELENOP, and KLF2 (Figure 2c). The same procedure to verify the 141 DEGs retrieved by ON_score 5 was also performed [23]. The use of ON_score 5 reveals a 142 gene set that includes genes not as differentially expressed as ON_score 10. This shows 143 the broader characteristics of the OS. We used ON_score 5 in the analysis of 3.4 comparison 144 of meta-analysis results by OS and hypoxia. 145

Evaluation of DEGs by oxidative stress 146
To evaluate the genes exceptionally expressed by OS, the parameter of ON_score 10 147 was applied to retrieve 985 DEGs. 32 genes that were already annotated with the 148 GO:0006979 (response to oxidative stress) were excluded from the DEGs, thus revealing 149 OS-related genes have not yet attracted attention (Figure 1a). The most upregulated 10 150 genes and the most downregulated 10 genes were retrieved and analyzed (Figure 1a, Fig-151 ure 3). Five out of the 10 most downregulated genes (H2BC14, PIMREG, KIF20A, CDC20, 152 and H2AC14) were related to cell cycle. Two of them (H2BC14 and H2AC14) encode the 153 core components of histones. In addition, two genes encoding zinc binding domains 154 (CRIP1 and CRIP3) are included in the list of the 10 most downregulated genes. In contrast, 155 the most 3 upregulated genes were CCL20, CXCL8, and CXCL1, encoding C-C motif chem-156 okine-20, interleukin-8, and growth-regulated alpha protein respectively. Genes that re-157 spond to inflammation were included in the most upregulated genes.     165 A schematic description of the retrieval and analysis of the downregulated genes in 166 both OS and hypoxia is shown in Figure 1b. We collected 985 DEGs of OS and hypoxia 167 using the ON_score and HN-score. Each DEGs were divided into two gene sets: 493 most 168 upregulated genes and 492 most downregulated genes. The four gene sets derived from 169 the two types of stress conditions were compared using Venn diagrams to show the com-170 mon differentially expressed genes (Figure 4a). We found that 44 genes were upregulated 171 in both stress conditions (termed as HN_up ON_up), 50 genes were downregulated in 172 both stress conditions (termed as HN_down ON_down), 11 genes were upregulated in 173 hypoxia but downregulated in OS (termed as HN_up ON_down), and 8 genes were up-174 regulated in OS but downregulated in hypoxia (termed as HN_down ON_up). The num-175 ber of genes upregulated or downregulated in both stress conditions was greater than the 176 number of genes upregulated or downregulated under either one of the stress conditions. 177 The characteristics of each gene set in common were analyzed by performing gene 178 set enrichment analysis using metascape. "R-HAS-69278: Cell Cycle, Mitotic" and 179 "GO:1903047: mitotic cell cycle process" are the most enriched terms with log10(p-value) 180 of -19.21 and -18.93 for HN_down ON_down (Figure 4b). HN_up ON_up is related to the 181 terms, "M145: PID P53 Downstream pathway" and "M166: PID ATF2 pathway" (Figure 182  4c). HN_up ON_down and HN_down ON_up included 11 genes and 8 genes respectively. 183 A list of genes in each gene set is shown in figshare [23].

189
In this study, we curated the 386 pairs of OS-related RNA-seq data collected from 190 public databases. The collected data were systematically processed and analyzed to iden-191 tify the DEGs related to OS. Gene set enrichment analysis was performed to identify and 192 confirm the characteristics of the DEGs. In addition, we implemented a new approach to 193 analyze the relationship between the two types of stresses, OS and hypoxia, by comparing 194 the results of both meta-analyses [10]. We compared the genes upregulated and downreg-195 ulated by hypoxia and OS to obtain four new gene sets, HN_up ON_up, HN_down 196 ON_down, HN_up ON_down, and HN_down ON_up. Each gene set was analyzed using 197 gene set enrichment analysis.

198
Meta-analysis of the OS dataset revealed two interesting genes encoding cysteine-199 rich proteins (CRIP1 and CRIP3) that were the 10 th and 5 th most downregulated by OS, 200 respectively. Each encoded protein contains zinc-binding domains, and the protein en-201 coded by CRIP1 is considered to act as a zinc transporter and absorption [24,25]. Previous 202 studies have reported several roles for zinc in antioxidant defense systems. For example, 203 zinc inhibits the enzyme nicotinamide adenine dinucleotide phosphate oxidase (NADPH-204 Oxidase) and promotes the synthesis of metallothionein which contributes to the reduc-205 tion of ROS [26]. Zinc is also known as a component of the enzyme superoxide dismutase 206 (SOD) which acts to reduce and maintain ROS levels in cells [26]. On the other hand, ex-207 cess zinc exhibits other types of toxicities leading to the symptoms such as nausea, vom-208 iting, fever, and headaches [27]. Therefore, zinc homeostasis is one of the key biological 209 systems for preventing various types of stresses. As the proteins encoded by CRIP1 and 210 CRIP3 contain zinc-binding domains, we can assume that they participate in the regula-211 tion of zinc homeostasis. Based on this hypothesis and the results of this study, we suggest 212 that the regulation of zinc homeostasis is impaired in OS due to decreased expression of 213 CRIP1 and CRIP3. Since zinc deficiency is known to be a cause of OS [3,28], we speculate 214 that the downregulation of CRIP1 and CRIP3 is affected by OS-induced pathways that 215 contribute to the reduced availability of zinc in cells. Uncovering the functions of CRIP1 216 and CRIP3 could be a way to clarify some of the relationships between OS and zinc ho-217 meostasis, which may promote the development or the prevention of OS and zinc home-218 ostasis-related diseases such as atherosclerosis [29], Parkinson's disease [30], cancer, and 219 hepatitis virus infection [31,32]. 220 The comparison of the meta-analysis results by two types of stresses, OS and hypoxia, 221 revealed gene sets that were found as differentially expressed in both stresses. Particularly 222 the gene set downregulated in both stresses showed distinct characteristic with cell cycle 223 (Figure 4b). This result supports the previous biological findings that DNA damage in-224 duced by increased ROS levels cause cell cycle arrest or apoptosis [33,34]. In addition, an 225 increase in ROS production in mitochondria is known to be a common event in both OS 226 and hypoxia [35]; therefore, the downregulation of cell cycle-related genes was an ex-227 pected result. Furthermore, meta-analysis of the OS dataset revealed five cell cycle-related 228 genes: H2BC14, PIMREG, KIF20A, CDC20, and H2AC14 that were respectively 2 nd , 3 rd , 6 th , 229 7 th , and 9 th most downregulated by OS, supporting the above observation by showing that 230 DEGs associated with OS are related to the cell cycle. As these ten OS-induced downreg-231 ulated genes were not included in the genes common to hypoxia, further research is 232 needed to clarify whether the expression of these genes is unique to OS or shared by types 233 of stresses other than hypoxia. 234 The results of this study may play a role in elucidating the causative mechanisms and 235 development of treatments for such diseases as atherosclerosis (OS and zinc homeostasis 236 related), chronic kidney disease, and metabolic syndrome (both OS and hypoxia related) 237 [36,37] by further studies searching on the functions of the important genes revealed. As 238 the number of public expression data increases, the more accurate and detailed infor-239 mation about genes that respond to OS can be obtained by updating the OS dataset in the 240 future. We have also shown the possibility of revealing information about the relation-241 ships between the types of stresses by comparing the results from the meta-analysis. Thus, 242 the use of collective intelligence including the results of this study, which will continue to 243 be produced in the future, makes it possible to efficiently promote studies on the search 244 for key pathways, for causes of diseases, and treatments of diseases.