Gene Coexpression Analysis Identifies Genes Associated with Chlorophyll Content and Relative Water Content in Pearl Millet

Pearl millet is a significant crop that is tolerant to abiotic stresses and is a staple food of arid regions. However, its underlying mechanisms of stress tolerance are not fully understood. Plant survival is regulated by the ability to perceive a stress signal and induce appropriate physiological changes. Here, we screened for genes regulating physiological changes such as chlorophyll content (CC) and relative water content (RWC) in response to abiotic stress by using “weighted gene coexpression network analysis” (WGCNA) and clustering changes in physiological traits, i.e., CC and RWC associated with gene expression. Genes’ correlations with traits were defined in the form of modules, and different color names were used to denote a particular module. Modules are groups of genes with similar patterns of expression, which also tend to be functionally related and co-regulated. In WGCNA, the dark green module (7082 genes) showed a significant positive correlation with CC, and the black (1393 genes) module was negatively correlated with CC and RWC. Analysis of the module positively correlated with CC highlighted ribosome synthesis and plant hormone signaling as the most significant pathways. Potassium transporter 8 and monothiol glutaredoxin were reported as the topmost hub genes in the dark green module. In Clust analysis, 2987 genes were found to display a correlation with increasing CC and RWC. Furthermore, the pathway analysis of these clusters identified the ribosome and thermogenesis as positive regulators of RWC and CC, respectively. Our study provides novel insights into the molecular mechanisms regulating CC and RWC in pearl millet.


Introduction
The global human population is predicted to reach 9 billion by 2050, and at the same time, the global temperature is expected to increase by 1 degree to 4 • C [1]. Given these predictions, there is a need to study abiotic-stress-tolerant staple food crops such as pearl millet to ensure food security. Pearl millet is an agronomically robust crop with an excellent nutritional profile and exceptional abiotic stress tolerance capacity [2]. Pearl millet can flower at 42 • C, grow in 250 mM of NaCl, and produces grain at mean precipitations of as low as 250 mm [3]. Despite these characteristics, pearl millet is often considered an 2 of 10 orphan crop as it lags behind other staple crops in research and development [4][5][6]. Since its genome sequencing in 2017 [7], substantial research on pearl millet has been carried out.
Several research efforts have focused on understanding pearl millet's physiological and molecular responses to abiotic stresses [8]. A previous physiological study by Shinde et al. revealed that the pearl millet tolerant line ICMB 01222 had a higher growth rate and accumulated higher sugar in leaves under salinity stress than the susceptible line [9]. A study using drought-responsive pearl millet lines exhibited that drought-susceptible pearl millet line ICMB 863 showed a greater reduction in CC or greenness than droughttolerant line ICMB 843. This study also reported photosynthesis, plant hormone signal transduction, and mitogen-activated kinase signaling as a drought-responsive pathway [10]. The function of several abiotic-stress-responsive genes and small RNAs has been studied in several plants [11] and pearl millet [12][13][14][15][16]. However, knowledge of molecular mechanisms regulating physiological responses is equally critical and important for ensuring the quality and yield of plants. Technological advancements in sequencing have created unprecedented opportunities to study these molecular mechanisms.
WGCNA is a widely used free-scale coexpression network analysis technique that constructs the modules containing genes, which shows a correlation with complex traits [17]. WGCNA has been widely used to study complex plant traits and their correlation with gene expression data [18]. In rice, WGCNA analysis identified CaM (calmodulin), DUF630/632 (domain of unknown function 630/632), CHL27 (Chlamydomonas 27), and LEA4-5 (Late Embryogenesis Abundant 4-5) as hub genes (central regulators) of salt-stress-related traits [19]. These hub genes will be useful for developing salt-tolerant rice genotypes.
In the current study, stress-induced physiological changes in CC and leaf water content were associated with gene expression data. We performed the WGCNA analysis [17] to generate two coexpression networks incorporating these physiological variables to determine modules of genes whose expression was regulated in concert with the physiological changes. Two interesting coexpression modules were identified, which show a positive and negative correlation with CC and RWC. For comparison, we performed the coexpression analysis using Clust [20]. Metabolic pathway analyses of these modules and clusters were also performed. In the future, our findings might help researchers to develop strategies for the biotechnological improvement of stress tolerance in pearl millet.

Transcript Quantification
After quality assessment and filtering, such as removing adaptor sequences and discarding low-quality reads, 89.2% of clean reads were generated on average. These clean reads were processed for transcriptome analysis. The clean reads mapped to the pearl millet genome. The average percentage of mappable reads per sample was 37.27%. Since transcripts per million (TPM) is the most used normalization method, which normalizes all the reads within a sample, the sum of all the reads would be exactly 1,000,000. The TPM for all 38,396 pearl millet genes was estimated, and the average TPM per sample was 26

Weighted Gene Coexpression Network Analysis (WGCNA)
The WGCNA analysis constructed 18 modules (clusters of coexpressed genes) differentiated by color. Modules are groups of genes with a similar expression pattern and are also functionally related and co-regulated. The WGCNA also allowed us to associate the correlation between the genes of each module and traits. Figure 1 shows the association between modules and characteristics, typically representing Pearson's correlation coefficients measured between every module (groups of genes with a similar expression pattern and co-regulation) and physiological characteristic. Because of the vast amount of data, we decided to focus on two modules, i.e., dark green and black. The dark green module (7082 genes) shows a positive correlation with CC (R = 0.71, p < 0.002). In contrast, the black module (genes) has a negative correlation with both traits (with CC, R = −0.51, p < 0.04 and with RWC, R = −0.56, p < 0.02). Data for all genes, their respective modules (18 eigengene modules), and correlation values are given in Supplementary File S3. The hub genes within the dark green module related to CC were detected. The top five hub genes were the potassium transporter 8, nonothiol glutaredoxin, chaperonin chloroplastic, oxoglutarate-dependent dioxygenase, and uncharacterized protein ( Figure 2). correlation between the genes of each module and traits. Figure 1 shows the association between modules and characteristics, typically representing Pearson's correlation coefficients measured between every module (groups of genes with a similar expression pattern and co-regulation) and physiological characteristic. Because of the vast amount of data, we decided to focus on two modules, i.e., dark green and black. The dark green module (7082 genes) shows a positive correlation with CC (R = 0.71, p < 0.002). In contrast, the black module (genes) has a negative correlation with both traits (with CC, R = −0.51, p < 0.04 and with RWC, R = −0.56, p < 0.02). Data for all genes, their respective modules (18 eigengene modules), and correlation values are given in Supplementary File S3. The hub genes within the dark green module related to CC were detected. The top five hub genes were the potassium transporter 8, nonothiol glutaredoxin, chaperonin chloroplastic, oxoglutaratedependent dioxygenase, and uncharacterized protein ( Figure 2).

Gene Clustering Analysis
Cluster analysis results were divided into two categories: CC-related genes and RWC-related genes. For CC, 12 clusters were formed (C0 to C11). The largest cluster was C0 (4699 genes), while the smallest cluster was C11 (59 genes) (Supplementary Figure S1). Among these clusters, C8 (603 genes) and C2 (835 genes) were selected for pathway analysis as they exhibited either positively or negatively coexpressed linear expression patterns with increasing chlorophyll content. For CC, the list of genes in each cluster is given in Supplementary File S4.
Similarly, for RWC, in total, 11 clusters were formed (C0 to C11). The largest cluster was C0 (3090 genes), while the smallest cluster was C3 (66 genes) (Supplementary Figure  S2). From these 11 clusters, C11 (828 genes) and C5 (721 genes) were selected for further analysis as they exhibited linear expression patterns either positively or negatively coexpressed with increasing RWC. For RWC, the list of genes present in each cluster is given in Supplementary File S5.

Metabolic Pathway Analysis
Pathway analysis of genes in the "darkgreen" module (positively correlated with chlorophyll content) showed them to be enriched in metabolic pathways such as the ribosome, plant hormone signal transduction, starch and sugar metabolism, photosynthesis, and MAPK signaling. Ribosome synthesis was the most enriched pathway with 48 genes associated ( Figure 3). In the dark green module, hub genes were identified according to a GS > 0.8 and an MM > 0.9. The table represents the top five hub genes.

Gene Clustering Analysis
Cluster analysis results were divided into two categories: CC-related genes and RWCrelated genes. For CC, 12 clusters were formed (C0 to C11). The largest cluster was C0 (4699 genes), while the smallest cluster was C11 (59 genes) (Supplementary Figure S1). Among these clusters, C8 (603 genes) and C2 (835 genes) were selected for pathway analysis as they exhibited either positively or negatively coexpressed linear expression patterns with increasing chlorophyll content. For CC, the list of genes in each cluster is given in Supplementary File S4.
Similarly, for RWC, in total, 11 clusters were formed (C0 to C11). The largest cluster was C0 (3090 genes), while the smallest cluster was C3 (66 genes) (Supplementary Figure S2). From these 11 clusters, C11 (828 genes) and C5 (721 genes) were selected for further analysis as they exhibited linear expression patterns either positively or negatively coexpressed with increasing RWC. For RWC, the list of genes present in each cluster is given in Supplementary File S5.

Metabolic Pathway Analysis
Pathway analysis of genes in the "darkgreen" module (positively correlated with chlorophyll content) showed them to be enriched in metabolic pathways such as the ribosome, plant hormone signal transduction, starch and sugar metabolism, photosynthesis, and MAPK signaling. Ribosome synthesis was the most enriched pathway with 48 genes associated ( Figure 3).
In linearly coexpressed clusters, the ribosome, protein processing in the endoplasmic reticulum, plant-pathogen interaction, plant hormone signal transduction, and phenylpropanoid biosynthesis were designed as positive regulators of RWC. Endocytosis, starch and sucrose metabolism, phenylalanine, tyrosine and tryptophan biosynthesis, the mRNA surveillance pathway, and RNA degradation were negative regulators of RWC. (Figure 4). In the case of CC clusters, thermogenesis, aminoacyl-tRNA biosynthesis, glycolysis/gluconeogenesis, an amino sugar, nucleotide sugar metabolism, and pyruvate metabolism were positive regulators of chlorophyll content, whereas plant hormone signal transduction, protein processing in the endoplasmic reticulum, ribosome biogenesis in eukaryotes, nucleocytoplasmic transport, and phenylpropanoid biosynthesis were negative regulators of CC ( Figure 5). In linearly coexpressed clusters, the ribosome, protein processing in the endoplasmic reticulum, plant-pathogen interaction, plant hormone signal transduction, and phenylpropanoid biosynthesis were designed as positive regulators of RWC. Endocytosis, starch and sucrose metabolism, phenylalanine, tyrosine and tryptophan biosynthesis, the mRNA surveillance pathway, and RNA degradation were negative regulators of RWC. (Figure 4). In the case of CC clusters, thermogenesis, aminoacyl-tRNA biosynthesis, glycolysis/gluconeogenesis, an amino sugar, nucleotide sugar metabolism, and pyruvate metabolism were positive regulators of chlorophyll content, whereas plant hormone signal transduction, protein processing in the endoplasmic reticulum, ribosome biogenesis in eukaryotes, nucleocytoplasmic transport, and phenylpropanoid biosynthesis were negative regulators of CC ( Figure 5).   In linearly coexpressed clusters, the ribosome, protein processing in the endoplasmic reticulum, plant-pathogen interaction, plant hormone signal transduction, and phenylpropanoid biosynthesis were designed as positive regulators of RWC. Endocytosis, starch and sucrose metabolism, phenylalanine, tyrosine and tryptophan biosynthesis, the mRNA surveillance pathway, and RNA degradation were negative regulators of RWC. (Figure 4). In the case of CC clusters, thermogenesis, aminoacyl-tRNA biosynthesis, glycolysis/gluconeogenesis, an amino sugar, nucleotide sugar metabolism, and pyruvate metabolism were positive regulators of chlorophyll content, whereas plant hormone signal transduction, protein processing in the endoplasmic reticulum, ribosome biogenesis in eukaryotes, nucleocytoplasmic transport, and phenylpropanoid biosynthesis were negative regulators of CC ( Figure 5).   Our WGCNA and Clust analysis results aligned well with each other as we observed that the ribosome and plant hormone signal transduction are positive regulators of RWC in both analyses.

Discussion
Pearl millet is a staple crop of arid and semi-arid tropics and is known for its unique ability to tolerate abiotic stress. Various studies used a genomic approach to identify genes and pathways related to abiotic stress in pearl millet [21][22][23][24][25][26]. However, none of the studies performed coexpression analysis to discover genes that regulate abiotic-stress-related traits. CC and RWC are abiotic-stress-related traits and change rapidly during abiotic stresses.
Integrating these dynamic changes in physiology with the transcriptome data using the WGCNA and Clust approach uncovered genes regulating the ribosome, plant hormone signal transduction, protein processing, thermogenesis, and endocytosis. Among those pathways, the ribosome is the most important as it is a positive regulator of RWC in both WGCNA and Clust analysis. The ribosomes are responsible for synthesizing proteins [27]. The functioning of the ribosome during stress is crucial for the timely synthesis of stress-responsive proteins [28,29]. Different stress-and defense-related pathways are deactivated upon ribosomal impairment [30]. This crosstalk between the ribosome and the stress-responsive protein serves as a novel approach to studying stress-tolerant plants. Plant hormone signaling and protein processing were other pathways reported in WGCNA modules and clusters. As plant hormones are early responders to stress stimulus [31], we suggest early sensing of stress by pearl millet at the molecular level.
Endocytosis, protein processing in the endoplasmic reticulum, and plant hormone signaling were the pathways that showed a negative correlation with both CC and RWC. A previous study in the Medicago plant reported the relation between drought stress and endocytosis [32] where endocytosis affects membrane lipids composition during stress. The endoplasmic reticulum is the organelle where most of the proteins of a cell are synthesized. Such proteins later play a significant role [33] in stress response and protection.

Discussion
Pearl millet is a staple crop of arid and semi-arid tropics and is known for its unique ability to tolerate abiotic stress. Various studies used a genomic approach to identify genes and pathways related to abiotic stress in pearl millet [21][22][23][24][25][26]. However, none of the studies performed coexpression analysis to discover genes that regulate abiotic-stress-related traits. CC and RWC are abiotic-stress-related traits and change rapidly during abiotic stresses.
Integrating these dynamic changes in physiology with the transcriptome data using the WGCNA and Clust approach uncovered genes regulating the ribosome, plant hormone signal transduction, protein processing, thermogenesis, and endocytosis. Among those pathways, the ribosome is the most important as it is a positive regulator of RWC in both WGCNA and Clust analysis. The ribosomes are responsible for synthesizing proteins [27]. The functioning of the ribosome during stress is crucial for the timely synthesis of stressresponsive proteins [28,29]. Different stress-and defense-related pathways are deactivated upon ribosomal impairment [30]. This crosstalk between the ribosome and the stressresponsive protein serves as a novel approach to studying stress-tolerant plants. Plant hormone signaling and protein processing were other pathways reported in WGCNA modules and clusters. As plant hormones are early responders to stress stimulus [31], we suggest early sensing of stress by pearl millet at the molecular level.
Endocytosis, protein processing in the endoplasmic reticulum, and plant hormone signaling were the pathways that showed a negative correlation with both CC and RWC. A previous study in the Medicago plant reported the relation between drought stress and endocytosis [32] where endocytosis affects membrane lipids composition during stress. The endoplasmic reticulum is the organelle where most of the proteins of a cell are synthesized. Such proteins later play a significant role [33] in stress response and protection.
We identified hub genes in the dark green module, which were highly correlated with the CC trait. The topmost hub gene we found in this study is potassium transporter 8. Potassium transporters protect plant leaves from Na+ overaccumulation and salt stress during salt stress [34]. Future experiments must validate the modules, clusters, metabolic pathways, and hub genes discovered in the present study.

RNA Sequencing Data Preprocessing
From our previously published study [10,12], raw sequencing reads were downloaded from the NCBI-SRA database using the SRA toolkit. The quality check of raw reads was performed using FastQC (https://www.bioinformatics.babraham.ac.uk/projects/ fastqc/ accessed on 1 August 2022). Adaptors and poor-quality reads were filtered using trimmomatic [35]. Clean reads were then used for mapping and quantifying transcript abundance.

Transcript Quantification
The transcriptome-wide quantification in the form of transcripts per million (TPM) was performed using Salmon v1.6.0 [36]. The pearl millet reference transcriptome file for mapping and quantification was obtained from the International Pearl Millet Genome Sequencing Consortium (https://cegresources.icrisat.org/data_public/PearlMillet_Genome/v1.1/ accessed on 15 August 2022). Prior to mapping and transcription quantitation, the pearl millet reference transcriptome file was indexed.

Weighted Gene Coexpression Network Analysis (WGCNA)
The R package, weighted gene correlation network analysis (WGCNA), was used to construct a gene coexpression network and to find coexpressed genes [14]. Gene expression data and TPM of 16 plants with two different traits (CC and RWC) were used for WGCNA analysis. Trait data used in this study were also taken from our previous study. (Supplementary File S1). At first, the genes with low TPM counts and outliers were filtered. To choose modules (group of genes) associated with traits of interest, i.e., CC and RWC, Pearson correlation analysis was determined between each module's "eigengene" and the traits. Modules with a module-trait correlation >0.5 or <−0.5 for at least one trait (p ≤ 0.05) were considered significant. Gene significance (GS) was calculated for each gene as the correlation between gene expression counts and traits. Hub genes were identified by choosing genes with the highest gene significance and module membership in the significant module. After this, we had higher module membership, gene significance, and gene number in dark green module. We selected only dark green for hub gene identification.

Gene Clustering Analysis
Gene expression data were clustered based on physiological measurements, i.e., CC and RWC using the Python package Clust v1.8.4. The plants were divided into four groups for chlorophyll content (relative greenness, SPAD reading), i.e., 10-14, 20-24, 24-28, and 28-32. The groups of genes that showed increasing or decreasing trend as CC changed were selected for pathway analysis. Similarly, for RWC (relative water content in leaf, in percentage), plants were divided into three groups, i.e., 40-50, 50-70, and 70-90, and changes in gene expression were recorded.

Metabolic Pathway Analysis
We selected the significantly associated modules (R > 0.5 or < −0.5, with a p-value of 0.05) and gene clusters for the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. The KEGG pathway analyses were performed by submitting the nucleotide sequences of modules and clusters to KEGG Automatic Annotation Server (KAAS server located at http://www.genome.jp/kegg/kaas/ accessed on 15 September 2022) [37].

Conclusions
This study's most relevant finding is identifying WGCNA modules (dark green (7082 genes) and black (1393 genes)) and clusters. Modules are defined as groups of genes with a similar expression pattern and are similar in function and co-regulation. These modules and clusters are enriched in the endoplasmic reticulum's metabolic pathways such as the ribosome synthesis, endocytosis, plant hormone, signal transduction, and protein processing. Based on the current finding, we postulate that during environmental stress, pearl millet can sense the stress using the hormones and transduces this signal to initiate the appropriate physiological and molecular responses and cope with stress-related perturbation. Moreover, the ribosome is involved in the synthesis of stress-responsive proteins during a stressed condition. Potassium transporter 8 and monothiol glutaredoxin were the most significant hub genes in the dark green module. The modules, metabolic pathways, and hub genes identified in this study provide guidelines for improving pearl millet's abiotic stress tolerance through future molecular breeding. With the discovery of these modules, candidate hub genes can now be paired with various molecular techniques, such as gene editing, to develop climate-smart crops.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/plants12061412/s1. Figure S1. Clust analysis of gene clusters related to chlorophyll content. A total of 11 clusters were identified. The x-axis shows CC, and the y-axis shows −log 10 of transcript per million values. Figure S2. Clust analysis of gene clusters related to RWC. A total of 11 clusters were identified. The x-axis shows RWC, and the y-axis shows −log 10 of transcript per million values. File S1. Physiological traits (CC and RWC) of 16   Data Availability Statement: In this study, publicly available sequencing data sets were analyzed. Bio Project ID PRJNA419859 and SRA accession number SRP128956.