Gene Set Subtraction Reveals 633 Candidate Genes for Bamboo Culm Wall Thickening

Abundant research has been conducted on the physiological, biochemical, and anatomical aspects of bamboo culm wall thickening, but its molecular mechanism has not yet been investigated. In this study, we performed whole-genome resequencing of Phyllostachys edulis ‘Pachyloen’, Phyllostachys nidularia f. farcta, Phyllostachys heteroclada f. solida with significantly thicker culm walls, and Schizostachyum dumetorum var. xinwuense with extremely thin culm walls. Moreover, we pioneered the innovative use of gene set subtraction to explore candidate genes that regulate bamboo culm wall thickening. A candidate gene set, containing 633 genes, was obtained by eliminating shared genes that help maintain physiological processes after alignment with the P. edulis reference genome. Starch and sucrose, oxidative phosphorylation, and ribosome were the three most important pathways enriched by differentially expressed genes. Although it cannot be used for hyperfine localization of bamboo wall thickness-regulatory genes, gene set reduction narrows down the range of candidate genes at minimal cost and provides new clues for the application of bioinformatics in plant research.


Introduction
Bamboo is the fastest growing and most versatile plant on earth, holding significant economic and cultural value [1,2]. Bamboo products have become an integral part of the daily lives of people globally, bringing immeasurable benefits. There are approximately 75 bamboo genera with approximately 1300 species and varieties covering 25 million hectares worldwide [3]. The thickness of the culm wall is one of the remarkable characteristics of the different species of bamboo plants, which directly affects their use. For example, bamboo with significantly thickened culm walls, such as Phyllostachys edulis 'Pachyloen', can be used as wood [4]. Therefore, the growth process and regulatory mechanisms of bamboo culm wall thickening are being continuously explored.
Using anatomical and mathematical modeling methods, the shoot apical meristem has been found to be a key factor in bamboo culm wall thickening [5]. The walls of bamboo stalks are primarily composed of thin-walled basic tissue cells and vascular bundles, with thick-walled cells forming the main body. Vascular bundle-related genes regulate the growth and development of vascular bundles, and improve plant resistance [6]. Comparison of transcriptomic data from P. edulis and P. edulis 'Pachyloen' at different developmental stages revealed several P. edulis 'Pachyloen'-specific genes that may play an important role in bamboo wall thickening [7]. However, it is still unclear which genes genes might contribute to the physiological process of bamboo culm wall thickening. Investigating regulatory genes for culm wall thickening is an important addition to bamboo studies.
The development of molecular biology assessment techniques has resulted in a new field of plant research. The use of molecular biology to study a particular trait in plants has gained increasing interest [8][9][10][11]. By constructing large-scale artificial or natural plant populations, assisted by highthroughput sequencing technology, it is possible to discover regulatory genes and mechanisms for specific traits [12][13][14]. However, it is also undoubtedly costly, both financially and timewise. In this study, we applied gene set subtraction to investigate genes that regulate bamboo culm wall thickening ( Figure 1). Schizostachyum dumetorum var. xinwuense belongs to Schizostachyum genus and has a thin culm wall. In contrast, P. edulis 'Pachyloen', P. nidularia f. farcta, and P. heteroclada f. solida are members of the genus Phyllostachys, with distinctly thickened culm walls that are 1-2 times thicker than S. dumetorum var. xinwuense [15][16][17]. We re-sequenced the whole genomes of these four bamboo species simultaneously and compared them with the P. edulis reference genome. We then intersected the obtained differential gene sets to narrow the candidate range of bamboo culm wall thickening regulatory genes. Gene set subtraction is an innovative technique for biological genome research. heteroclada f. solida, and Schizostachyum dumetorum var. xinwuense was compared with the reference genome of P. edulis to obtain the different gene sets (A-D). The intersection of A-C contained essential genes that regulate bamboo wall thickness and maintain minimum survival functions. Similarly, the intersection of A-D only contain essential genes for maintaining minimal survival functions. Thus, the intersection of A-C minus the intersection of A-D result in the genes regulating bamboo culm wall thickening.

Plant Materials, DNA Extraction, and Whole Genome Resequencing
The samples of P. edulis 'Pachyloen', P. nidularia f. farcta, P. heteroclada f. solida, and S. dumetorum var. xinwuense were collected at 1.5 cm from the apex of the shoot on 21 March 2020, from the experimental bamboo forest (113.1124° E, 28.2698° N; 44.9 m above sea level) in Lukou Town, Changsha, Hunan Province, China. The collected shoots were immediately washed with distilled Figure 1. Subtraction of gene sets. The genome of P. edulis 'Pachyloen', P. nidularia f. farcta, P. heteroclada f. solida, and Schizostachyum dumetorum var. xinwuense was compared with the reference genome of P. edulis to obtain the different gene sets (A-D). The intersection of A-C contained essential genes that regulate bamboo wall thickness and maintain minimum survival functions. Similarly, the intersection of A-D only contain essential genes for maintaining minimal survival functions. Thus, the intersection of A-C minus the intersection of A-D result in the genes regulating bamboo culm wall thickening.

Plant Materials, DNA Extraction, and Whole Genome Resequencing
The samples of P. edulis 'Pachyloen', P. nidularia f. farcta, P. heteroclada f. solida, and S. dumetorum var. xinwuense were collected at 1.5 cm from the apex of the shoot on 21 March 2020, from the experimental bamboo forest (113.1124 • E, 28.2698 • N; 44.9 m above sea level) in Lukou Town, Changsha, Hunan Province, China. The collected shoots were immediately washed with distilled water, wrapped in foil, placed in a liquid nitrogen tank, and stored at −80 • C until DNA extraction. The voucher specimens were deposited in the College of Forestry, Nanjing Forestry University (NJFU2019-857). Genomic DNA was extracted using the DNeasy Plant Mini Kit (Qiagen, Germany) according with the manufacturer's instructions. Whole genome resequencing was performed using an Hiseq 2500 platform (Illumina, San Diego, CA, USA) at the Tsinghua University, Beijing, China.

Sequence Alignment and Detection of Single Nucleotide Polymorphisms (SNPs) and Insertion-Deletion Mutations (InDels)
Clean reads were mapped to the bamboo reference genome (P. edulis) [18] using the Burrows-Wheeler Aligner [19] and submitted to the National Center for Biotechnology Information (NCBI) Short Reads Archive (SRA) database (accession numbers SRP269283, SRP270039, SRP269348, and SRP269303 for S. dumetorum var. xinwuense, P. edulis 'Pachyloen', P. heteroclada f. farcta, and P. nidularia f. solida, respectively. The DRAGEN toolkit (https://www.illumina.com/products/bytype/informatics-products/dragen-bio-it-platform.html) was used to conduct read mapping and SNP identification, with the clean reads of FASTQ files used for alignment, sorting, duplicate removal, and variant identification. The hidden Markov model and Smith-Waterman alignment in GATK 4.0 Haplotype variant caller [20] were used for SNP identification. The raw SNP/InDel sets were called by SAM-tools with the parameters '−q 1−C 50−m 2−F 0.002−d 1000 , filtered for the mapping quality >20, and the depth of the variate position of >6.

Genetic Annotation and Identification of Candidate Genes
The SNP variations, InDel mutations, and copy number variations located in the coding/exon regions were enriched to identify genetic annotations and candidate genes. Gene ontology (GO) enrichment analysis (http://www.geneontology.org) was implemented using the GOseq R package [21], in which gene length bias was corrected. GO terms with a corrected p-value < 0.01 were considered significantly enriched. Kyoto encyclopedia of genes and genomes (KEGG) pathway enrichment analysis (http://www.kegg.jp) was performed using KOBAS 2.0 [22] and visualized using the clusterProfiler R package. According to the condition of corrected p-value < 0.01, the KEGG terms were considered significantly enriched.

Genome-Wide Variation of Four Species of Bamboo
P. edulis 'Pachyloen', P. nidularia f. farcta, P. heteroclada f. solida, and S. dumetorum var. xinwuense generated a large amount of SNP and InDel loci information when compared with the reference genome of P. edulis ( Table 1). The resequencing results of P. edulis 'Pachyloen' showed that the ratio of base transitions to transversions (Ts/Tv) was 2.93, which was slightly higher than that reported in Arabidopsis [23]. The frequency of the four base conversions from high to low was as follows: C→T (19.07%) > G→A (19.01%) > A→G (18.27%) > T→C (18.19%). There were 1,718,960 SNPs in the P. edulis 'Pachyloen' genome, with an average of about 1093 bases showing a mutation, representing a mutation frequency of 0.6%, which was lower than the 1% mutation frequency reported previously [24]. A total of 72,337 InDels were found in the P. edulis 'Pachyloen' genome, of which 33,127 (45.80%) were homozygous, 38,799 (53.64%) were heterozygous, and 411 (0.56%) affected multiple alleles. The number of SNPs in the genomes of P. nidularia f. farcta and P. heteroclada f. solida was far greater than that of P. edulis 'Pachyloen'. On average, one SNP was found for every 94 and 103 bases of P. nidularia f. farcta, and P. heteroclada f. solida, respectively. Among the 489,064 InDels in P. nidularia f. farcta, 88,095 were heterozygous, 398,617 were homozygous, and 2352 affected multiple alleles. Among the 335,745 InDels of P. heteroclada f. solida, 282,184 (84.05%) were distributed in the intergenic region, 33,489 (9.97%) were in intron regions, and 12,290 (3.66%) were in exon regions.
The number of SNPs detected in S. dumetorum var. xinwuense with obvious thin-walled characteristics was 1,683,311, with an average of 958 bases appearing in one SNP, which is close to the frequency of base mutations in the P. edulis 'Pachyloen', but significantly lower than that of P. nidularia f. farcta and P. heteroclada f. solida.

Determination of the Differential Gene Set
Alignment of P. edulis 'Pachyloen', P. nidularia f. farcta, P. heteroclada f. solida, and S. dumetorum var. xinwuense sequences to the reference genome generated 3055; 4309; 4656; and 3711 differential genes, respectively. There were a total of 891 genes in the different intersections of the four bamboo species, which regulate the common physiological activities ( Figure 2). P. edulis 'Pachyloen', P. nidularia f. farcta, and P. heteroclada f. solida, all of which have thick-walled traits, had 1524 identical differentially expressed genes. Therefore, the remaining 633 genes may be solely related to the regulation of bamboo wall thickening. The IDs of these genes are provided in Supplementary File S1. The number of SNPs in the genomes of P. nidularia f. farcta and P. heteroclada f. solida was far greater than that of P. edulis 'Pachyloen'. On average, one SNP was found for every 94 and 103 bases of P. nidularia f. farcta, and P. heteroclada f. solida, respectively. Among the 489,064 InDels in P. nidularia f. farcta, 88,095 were heterozygous, 398,617 were homozygous, and 2352 affected multiple alleles. Among the 335,745 InDels of P. heteroclada f. solida, 282,184 (84.05%) were distributed in the intergenic region, 33,489 (9.97%) were in intron regions, and 12,290 (3.66%) were in exon regions.
The number of SNPs detected in S. dumetorum var. xinwuense with obvious thin-walled characteristics was 1,683,311, with an average of 958 bases appearing in one SNP, which is close to the frequency of base mutations in the P. edulis 'Pachyloen', but significantly lower than that of P. nidularia f. farcta and P. heteroclada f. solida.

Determination of the Differential Gene Set
Alignment of P. edulis 'Pachyloen', P. nidularia f. farcta, P. heteroclada f. solida, and S. dumetorum var. xinwuense sequences to the reference genome generated 3055; 4309; 4656; and 3711 differential genes, respectively. There were a total of 891 genes in the different intersections of the four bamboo species, which regulate the common physiological activities ( Figure 2). P. edulis 'Pachyloen', P. nidularia f. farcta, and P. heteroclada f. solida, all of which have thick-walled traits, had 1524 identical differentially expressed genes. Therefore, the remaining 633 genes may be solely related to the regulation of bamboo wall thickening. The IDs of these genes are provided in Supplementary File 1.

GO and KEGG Enrichment Analysis of Differentially Expressed Genes (DEGs)
The identified 633 DEGs were enriched and analyzed according to the latest GO and KEGG annotation information on the Moso bamboo reference genome (http://bamboogdb.com). These genes were enriched in 36 GO terms, of which 14 belonged to biological processes (BP), 9 belonged to cellular components (CC), and 13 belonged to the molecular function (MF) subset ( Figure 3). Overall, the number of DEGs enriched in 'catalytic activity' was the largest, reaching 516, whereas the 'nutrient reservoir activity' had only one gene. In the BP category, the vast majority of unigenes were enriched in cellular and metabolic processes. It was also found that 'binding' and 'catalytic activity' had the highest degree of enrichment in the MF category.

GO and KEGG Enrichment Analysis of Differentially Expressed Genes (DEGs)
The identified 633 DEGs were enriched and analyzed according to the latest GO and KEGG annotation information on the Moso bamboo reference genome (http://bamboogdb.com). These genes were enriched in 36 GO terms, of which 14 belonged to biological processes (BP), 9 belonged to cellular components (CC), and 13 belonged to the molecular function (MF) subset ( Figure 3). Overall, the number of DEGs enriched in 'catalytic activity' was the largest, reaching 516, whereas the 'nutrient reservoir activity' had only one gene. In the BP category, the vast majority of unigenes were enriched in cellular and metabolic processes. It was also found that 'binding' and 'catalytic activity' had the highest degree of enrichment in the MF category. Forests 2020, 11, x FOR PEER REVIEW 6 of 14 KEGG analysis provided a more intuitive understanding of the function and interaction of differential genes, revealing that the 633 DEGs were enriched in 116 pathways. The analysis showed (Figure 4) that the most significantly enriched pathway was 'nitrogen metabolism', followed by 'RNA transport'. The pathway with the largest amount of DEGs enrichment was the 'metabolic pathway', and 'biosynthesis of secondary metabolites' was the second. The KEGG enrichment pathway information for the 633 DEGs is provided in Supplementary File 2.
From the perspective of the secondary classification level of the KEGG metabolic pathway analysis, in the 'Carbohydrate metabolism', the 'Starch and sucrose metabolism' pathway was the most enriched core profile, encompassing 19 genes. In 'lipid metabolism', the highest number of genes was annotated to the 'Glycerophospholipid metabolism' pathway, with 10 genes. In 'Glycan biosynthesis and metabolism', six core genes were linked to 'N-Glycan biosynthesis'. Among 'Transcription' and 'Translation', the most differentially enriched subsets were 'Spliceosome' and 'RNA transport' with 34 genes, each. KEGG analysis provided a more intuitive understanding of the function and interaction of differential genes, revealing that the 633 DEGs were enriched in 116 pathways. The analysis showed (Figure 4) that the most significantly enriched pathway was 'nitrogen metabolism', followed by 'RNA transport'. The pathway with the largest amount of DEGs enrichment was the 'metabolic pathway', and 'biosynthesis of secondary metabolites' was the second. The KEGG enrichment pathway information for the 633 DEGs is provided in Supplementary File S2.
From the perspective of the secondary classification level of the KEGG metabolic pathway analysis, in the 'Carbohydrate metabolism', the 'Starch and sucrose metabolism' pathway was the most enriched core profile, encompassing 19 genes. In 'lipid metabolism', the highest number of genes was annotated to the 'Glycerophospholipid metabolism' pathway, with 10 genes. In 'Glycan biosynthesis and metabolism', six core genes were linked to 'N-Glycan biosynthesis'. Among 'Transcription' and 'Translation', the most differentially enriched subsets were 'Spliceosome' and 'RNA transport' with 34 genes, each.

Analysis of Critical Pathways
The greater the number of DEGS in a particular type or metabolic pathway, the greater the redundancy of genes, involved in that type of life activity, the richer the metabolic pathways that may occur, resulting in a greater variety of physiological phenomena. A high number of DEGs in metabolic pathways makes it more likely for those genes to regulate life activities and have a greater impact on the phenotypic traits of species [25][26][27][28]. The analysis of the main enrichment pathways among the 633 core genes could help better understand the molecular basis of bamboo culm wallthickening. The culm wall requires the mobilization of more sugars for synthesis and more active energy metabolism during the thickening growth [29][30][31]. Therefore, we selected the three pathways with a higher number of genetic mutations occurring in metabolic pathways, as well as representative genes of those pathways, to explore further.

Analysis of Critical Pathways
The greater the number of DEGS in a particular type or metabolic pathway, the greater the redundancy of genes, involved in that type of life activity, the richer the metabolic pathways that may occur, resulting in a greater variety of physiological phenomena. A high number of DEGs in metabolic pathways makes it more likely for those genes to regulate life activities and have a greater impact on the phenotypic traits of species [25][26][27][28]. The analysis of the main enrichment pathways among the 633 core genes could help better understand the molecular basis of bamboo culm wall-thickening. The culm wall requires the mobilization of more sugars for synthesis and more active energy metabolism during the thickening growth [29][30][31]. Therefore, we selected the three pathways with a higher number of genetic mutations occurring in metabolic pathways, as well as representative genes of those pathways, to explore further.
Starch and sucrose are the main end-products of plant photosynthesis [32]. SNPs or InDels were present in 19 of the core poor genes in the starch and sucrose metabolic pathways ( Figure 5), but there were six genetic mutations that did not cause amino acid changes (PH01000679G0170, PH01000035G0110, PH01000026G0020, PH01005567G0040, PH01000376G0610, PH01001213G0550). The remaining 13 mutant genes changed the number of amino acids that were not the same ( Table 2). Starch and sucrose are the main end-products of plant photosynthesis [32]. SNPs or InDels were present in 19 of the core poor genes in the starch and sucrose metabolic pathways ( Figure 5), but there were six genetic mutations that did not cause amino acid changes (PH01000679G0170, PH01000035G0110, PH01000026G0020, PH01005567G0040, PH01000376G0610, PH01001213G0550). The remaining 13 mutant genes changed the number of amino acids that were not the same ( Table 2).  Oxidative phosphorylation is an important part of the process by which substances release energy during oxidation in plants [33]. Among the 633 DEGs set, 15 genes were enriched in the metabolic pathway of oxidative phosphorylation, of which seven gene mutations (PH01000465G0500, PH01003254G0190, PH01001083G0220, PH01002804G0240, PH01002320G0330, PH01002524G0200, PH01000299G0200) did not cause amino acid changes ( Figure 6 and Table 3).  Oxidative phosphorylation is an important part of the process by which substances release energy during oxidation in plants [33]. Among the 633 DEGs set, 15 genes were enriched in the metabolic pathway of oxidative phosphorylation, of which seven gene mutations (PH01000465G0500, PH01003254G0190, PH01001083G0220, PH01002804G0240, PH01002320G0330, PH01002524G0200, PH01000299G0200) did not cause amino acid changes ( Figure 6 and Table 3).  The ribosome is the place where the process of RNA translation into protein occurs, which directly affects the performance of organisms [34]. A total of 29 core DEGs were annotated into the 'ribosomal metabolic' pathways, 10 of which were distributed on the 'small subunit' and 19 on the 'large subunit' (Figure 7). Mutations in 18 of the 29 DEGs altered the sequence of amino acids, 11 on the 'large subunit' and 7 on the 'small subunit'. The number of amino acid changes caused by the PH01000716G0300 gene was large, leading to six amino acid sequence changes. PH01000145G0580, PH01000632G0190, PH01001171G0230, PH01002396G0040, PH01000716G0300, PH01000553G0230, and PH01001955G0010 also had base insertions and deletions at the same time in the exon region, which had a stronger impact on RNA translation (Table 4).  The ribosome is the place where the process of RNA translation into protein occurs, which directly affects the performance of organisms [34]. A total of 29 core DEGs were annotated into the 'ribosomal metabolic' pathways, 10 of which were distributed on the 'small subunit' and 19 on the 'large subunit' (Figure 7). Mutations in 18 of the 29 DEGs altered the sequence of amino acids, 11 on the 'large subunit' and 7 on the 'small subunit'. The number of amino acid changes caused by the PH01000716G0300 gene was large, leading to six amino acid sequence changes. PH01000145G0580, PH01000632G0190, PH01001171G0230, PH01002396G0040, PH01000716G0300, PH01000553G0230, and PH01001955G0010 also had base insertions and deletions at the same time in the exon region, which had a stronger impact on RNA translation (Table 4).

A New Perspective on Gene Set Subtraction
The subtraction of gene sets provides new clues for bioinformatics research applications in plants. Plant traits are regulated by a network of numerous genes [35], thus, probing those genes through bioinformatics is often costly as it requires the analysis of large populations and expensive sequencing. Although the cost of sequencing has been reduced as a result of technological advances, conventional research strategies for plant trait regulation, such as bulked segregant analysis (BSA) [36,37] and genome-wide association studies (GWAS) [38,39], require large populations and large sample sizes to meet the needs, ultimately resulting in high research costs. The set of differential genes obtained by phase-down whole-genome resequencing can greatly narrow the range of candidate differential genes. The use of gene set subtraction provides a new strategy to apply bioinformatics. Not only resequencing data, but also transcriptomic and proteomic data, can be used to explore questions along these lines. The combination of bioinformatics and mathematics can provide a shortcut to complex biological problems.
However, it cannot be denied that this method is flawed and does not allow for hyperfine localization of trait-regulated genes. There are countless differences between each life form. The DEGs set obtained by subtracting the gene set may be related to all the differences. Therefore, we cannot be completely sure which specific gene in the gene set is responsible for which differences. By analyzing the regulatory and metabolic pathways influenced by these genes, we can essentially identify the main physiological activity differences responsible for the significant differences in traits.

A Gene Set for Bamboo Culm Wall Thickness Regulation Containing 633 Candidate Genes
Bamboo culm walls vary in thickness, regardless if it is herbal bamboo or woody bamboo. The growth of bamboo shoots before being unearthed is dominated by thickening growth, and the thickness of culm after being unearthed has already been determined. After being excavated, the growth of bamboo culm wall will not be thickened, and the growth of internodes will be dominated by high growth. The most significant difference between P. edulis 'Pachyloen', P. nidularia f. farcta, P. heteroclada f. solida, and S. dumetorum var. xinwuense is the thickness of the culm wall [40]. We have identified a gene set, containing 633 thickness-associated regulatory candidate genes of bamboo culm, by sampling during the thickening growth process and using gene set subtraction. In addition to the thickness of the culm wall, there are still imperceptible differences between the four bamboo species. The genes associated with these differences were essentially included in this candidate set. The screening of candidate gene sets can be of great help for subsequent research.

Reticulation Pathway for Bamboo Culm Wall Thickness Control
It was found that starch and sucrose, oxidative phosphorylation, and ribosome were the major metabolic pathways involved in bamboo culm wall thickening, with up to 116 metabolic pathways regulating these processes in a meshwork of metabolic pathways. It has been shown that the shoot apical meristem (SAM) plays an important role in the development of bamboo culm wall thickening. A larger SAM leads to elevated levels of hormones (e.g., cytokinin and growth hormone) and upregulation of hormone signaling and downstream functional genes (e.g., metabolism-related genes), ultimately leading to thickening of the bamboo culm wall [41]. The growth of SAM requires large amounts of energy and nutrients, which is consistent with the results of this study showing that many differential genes are enriched in the metabolic pathway of starch and sucrose and oxidative phosphorylation. Therefore, the study of the major metabolic pathways can help explain the differences in bamboo wall thickness, with a mutation in one or more of these pathways may result in a significant change in the trait. We have selected the three most abundant pathways to analyze the mutated genes, and although these genes will need to be validated in subsequent studies, we are significantly closer to confirming them. Bamboo with a thick culm wall is more valuable than thin-walled bamboo and has more applications. The use of modern breeding methods to cultivate good varieties with thickened bamboo culm walls after exploring the gene regulation network for thickened bamboo walls could be a boon to the bamboo industry.

Conclusions
In this study, we obtained a set comprising 633 genes related to bamboo culm wall thickness regulation by gene set reduction. Starch and sucrose, oxidative phosphorylation, and ribosome were the major metabolic pathways leading to bamboo culm wall thickening, as determined by DEGs enrichment analysis. Although it cannot be used for hyperfine localization of bamboo wall thickness-regulated genes, gene set reduction narrows down the range of candidate genes at minimal cost and provides new clues for the application of bioinformatics in plant research.