Combined Analysis of BSA-Seq Based Mapping, RNA-Seq, and Metabolomic Unraveled Candidate Genes Associated with Panicle Grain Number in Rice (Oryza sativa L.)

Rice grain yield is a complex and highly variable quantitative trait consisting of several key components, including the grain weight, the effective panicles per unit area, and the grain number per panicle (GNPP). The GNPP is a significant contributor to grain yield controlled by multiple genes (QTL) and is crucial for improvement. Attempts have been made to find genes for this trait, which has always been a challenging and arduous task through conventional methods. We combined a BSA analysis, RNA profiling, and a metabolome analysis in the present study to identify new candidate genes involved in the GNPP. The F2 population from crossing R4233 (high GNPP) and Ce679 (low GNPP) revealed a frequency distribution fitting two segregated genes. Three pools, including low, middle, and high GNPP, were constructed and a BSA analysis revealed six candidate regions spanning 5.38 Mb, containing 739 annotated genes. Further, a conjunctive analysis of BSA-Seq and RNA-Seq showed 31 differentially expressed genes (DEGs) in the candidate intervals. Subsequently, a metabolome analysis showed 1024 metabolites, with 71 significantly enriched, including 44 up and 27 downregulated in Ce679 vs. R4233. A KEGG enrichment analysis of these 31 DEGs and 71 differentially enriched metabolites (DEMs) showed two genes, Os12g0102100 and Os01g0580500, significantly enriched in the metabolic pathways’ biosynthesis of secondary metabolites, cysteine and methionine metabolism, and fatty acid biosynthesis. Os12g0102100, which encodes for the alcohol dehydrogenase superfamily and a zinc-containing protein, is a novel gene whose contribution to the GNPP is not yet elucidated. This gene coding for mitochondrial trans-2-enoyl-CoA reductase is involved in the biosynthesis of myristic acid, also known as tetradecanoic acid. The Os01g0580500 coding for the enzyme 1-aminoclopropane-1-carboxylate oxidase (OsACO7) is responsible for the final step of the ethylene biosynthesis pathway through the conversion of 1-aminocyclopropane-1-carboxylic acid (ACC) into ethylene. Unlike Os12g0102100, this gene was significantly upregulated in R4233, downregulated in Ce679, and significantly enriched in two of the three metabolite pathways. This result pointed out that these two genes are responsible for the difference in the GNPP in the two cultivars, which has never been identified. Further validation studies may disclose the physiological mechanisms through which they regulate the GNPP in rice.


Introduction
Rice (Oryza sativa L.) is the foremost staple food feeding half of the world's population and more than half of China's population [1]. By 2050, the global population will be around 9.7 billion, making the global food and energy demand more challenging than ever. The decrease in suitable agricultural land due to several factors, including climatic changes, leads to strategies to improve the rice grain yields without further expanding farmland and environmental damage [2]. Rice grain yield is a complex and highly variable quantitative trait that depends on three fundamental components: the weight of grain, the number of effective panicles per unit area, and the grain number per panicle (GNPP), each controlled by multiple genes (QTL). The GNPP has been revealed as the primary contributor to total yield per unit area and is essential for its improvement [3].
The physiological network and genes underlying panicle and grain formation have been thoroughly studied. The primary mechanism includes the initiation during which the shoot apical meristem (SAM) is transformed to inflorescence meristem (IM), the formation of rachis branches, and spikelet specialization successively. These steps are vital in the panicle architecture and GNPP [4,5]. Recent studies highlighted several hormones, including cytokinin (CK), gibberellin acid (GA), abscisic acid (ABA), and ethylene (ET), that interfere to regulate the transcriptional and post-transcriptional activities of genes in rice, acting in different pathways involved in these processes [6][7][8][9]. Numbers of genes have been identified from high-throughput QTL mapping, and within the past ten years, the number of the QTLs related to the GNPP doubled from 315 QTL [10], 369 [11] to 722 QTL (http://www.gramene.org; accessed on 10 April 2022) distributed along the 12 chromosomes in the rice genome. The first QTL grain number 1a (GN1a) was mapped and cloned on chromosome1 coding for a cytokinin oxidase (OsCKX2) which repressed the GNPP by reducing CK content [12], and with aberrant panicle organization1 (APO1) controlling the proliferation of cells in the meristem [4]. These works set the foundation for GNPP-related gene mapping. CK is essential in rachis branch formation, and the high expression of OsCKX2 weakened CK signaling in IM leads to few rachis branches and GNPP [13,14]. Besides regulating GA homeostasis, OsCYP71D8L negatively controls the length of the panicles, and the rachis branch numbers, and subsequently the rice GNPP [15]. Major QTL which controls, directly or indirectly, the GNPP and related traits include: heading date 7 (Ghd7), which increases the differentiation period [16]; LAX regulating the rachis branches initiation [17]; dense erect panicle1 (DEP1) controlling the secondary branches of inflorescence and GNPP [18]; grain number 4-1 (GN4-1) and grain number per panicle1 (GNP1) regulate the number of rachis branches and GNPP [10,19]; LONELY GUY (LOG) positively regulates GNPP by controlling the concentration and distribution of CK [13]; and Pyrabactin Resistance-Like (PYL) positively regulates ABA signaling and negatively regulates GNPP in rice [7].
The development of new molecular technology allowed the transition from the conventional method, and the integration of omics tools facilitated the identification of new functional genes and pathways. During the last decade, most of the studies for mapping QTL relied on traditional methods, which involve the construction of mapping the population through the F 2 generation, recombinant inbred lines (RIL), and near-isogenic lines (NIL), followed by the drawing of a genetic linkage map, and cloning. The following is a strategy by which numerous genes have been mapped and cloned. The most relevant include qGN1c mapped using a set of chromosomal segment substitution lines (CSSLs) [20], SPP1 identified through NILs controlling the number of spikelets [21], qTGW3.2 for grain weight in rice using RIL [22], GNP1 [19], and qgnp7(t) [23] governing GNPP revealed through map-based cloning, and two steps substitution mapping, respectively. However, traditional gene mapping is tedious and time-consuming; most QTLs are inconsistent and span large genomic regions, making identifying candidate genes for a trait more challenging. New strategies which combine traditional methods and emerging technologies such as high-throughput whole-genome sequencing (NGS), alternative sequencing analysis, mapping by sequencing, bulked segregants analysis (BSA), RNA-sequencing (RNA-seq), and genome-wide association study (GWAS), have accelerated the identification of candidate genes for complex characters, and several studies have been successfully conducted to identify a new gene [2,[24][25][26][27]. BSA-Seq and RNA profiling were coupled to reveal the pathway and genes associated with the heading type in Chinese cabbage [28], chilling tolerance in rice [24], and plant architecture in Brassica napus [29]. Likewise, significant gene candidates have been identified by combining sequencing and a metabolome analysis in albino jackfruit seedlings [30] and anthocyanin in cucumber fruit skin [31].
Although comprehensive studies on QTL mapping for the GNPP have been carried out, discovering new QTL remains an excellent value for molecular and practical breeding. Considering the complexity of grain number related-genes and the limitation of traditional QTL mapping technology, in this study, we successfully integrated the BSA-seq, RNA-Seq, and metabolome analysis to identify two new genes linked to the GNPP.

Plant Materials and Growth Conditions
In this work, two indica rice varieties from Guangxi province, China, were used, including Ce679 with a low grain number and R4233 with a high grain number. Ce679 is a restorer line developed from a common wild rice and IR661 and IR2061 (IR661 IR2061/Hepu wild rice). It has strong lodging resistance, high combining ability, good rice quality, and the strong cold resistance of Hepu wild rice. R4233 is a restorer line developed through three generations of backcrossing of Ce679 and F 1 (Ce679 75-1-172/C4064). After several tests, the R4233 restorer line contained the blast resistance gene Pi9 (from 75-1-172) and a high grain number compared to all other parents. Crossing between Ce679 and R4233 generated 436 F 2 segregating populations from which the genomic regions associated with the GNPP were identified. The parents Ce679 and R423 were grown in two different sites to evaluate if the environment will affect the GNPP, and assays were conducted from 2019 to 2021 during the two yearly planting seasons known as early season  and late season (20 August to 20 November). Phenotypic evaluations of grain number per panicle in the F 2 population were carried out during the period 2019-2020 (four seasons: two early and two late seasons) in an experimental field at Guangxi University, Nanning city, Guangxi province, China (22 • 48 N, 108 • 22 E), and the early season from March to August 2021 in Bobai, Guangxi province (22.27 • N, 109.98 • E), respectively ( Figure 1) with different climates. The experimental design was a randomized complete design. In total, three plots (two for the two parents and one for the F 2 ) of 1 m × 12 m and 1 × 18 m were designed. The distance between plants was 20 × 20 cm and each plot had 300 plants for parents and 436 plants for F 2 . and genome-wide association study (GWAS), have accelerated the identification of candidate genes for complex characters, and several studies have been successfully conducted to identify a new gene [2,[24][25][26][27]. BSA-Seq and RNA profiling were coupled to reveal the pathway and genes associated with the heading type in Chinese cabbage [28], chilling tolerance in rice [24], and plant architecture in Brassica napus [29]. Likewise, significant gene candidates have been identified by combining sequencing and a metabolome analysis in albino jackfruit seedlings [30] and anthocyanin in cucumber fruit skin [31]. Although comprehensive studies on QTL mapping for the GNPP have been carried out, discovering new QTL remains an excellent value for molecular and practical breeding. Considering the complexity of grain number related-genes and the limitation of traditional QTL mapping technology, in this study, we successfully integrated the BSA-seq, RNA-Seq, and metabolome analysis to identify two new genes linked to the GNPP.

Plant Materials and Growth Conditions
In this work, two indica rice varieties from Guangxi province, China, were used, including Ce679 with a low grain number and R4233 with a high grain number. Ce679 is a restorer line developed from a common wild rice and IR661 and IR2061 (IR661∥IR2061/Hepu wild rice). It has strong lodging resistance, high combining ability, good rice quality, and the strong cold resistance of Hepu wild rice. R4233 is a restorer line developed through three generations of backcrossing of Ce679 and F1 (Ce679∥75-1-172/C4064). After several tests, the R4233 restorer line contained the blast resistance gene Pi9 (from 75-1-172) and a high grain number compared to all other parents. Crossing between Ce679 and R4233 generated 436 F2 segregating populations from which the genomic regions associated with the GNPP were identified. The parents Ce679 and R423 were grown in two different sites to evaluate if the environment will affect the GNPP, and assays were conducted from 2019 to 2021 during the two yearly planting seasons known as early season (15 March-15 July) and late season (20 August to 20 November). Phenotypic evaluations of grain number per panicle in the F2 population were carried out during the period 2019-2020 (four seasons: two early and two late seasons) in an experimental field at Guangxi University, Nanning city, Guangxi province, China (22°48′ N, 108°22′ E), and the early season from March to August 2021 in Bobai, Guangxi province (22.27° N, 109.98° E), respectively ( Figure 1) with different climates. The experimental design was a randomized complete design. In total, three plots (two for the two parents and one for the F2) of 1 m × 12 m and 1 × 18 m were designed. The distance between plants was 20 × 20 cm and each plot had 300 plants for parents and 436 plants for F2.   For each trial, ten plants were harvested from Ce679 and R4233, the total grain number was counted, and the number of panicles per plant was evaluated from an individual plant. The grain per panicle was obtained by dividing the total grain per plant by panicles per plant. The data presented here are the means with SD (n = 10). ERN and LRN denote early rice and late rice in Nanning, respectively; ERB denotes early rice in Bobai.

Phenotypic Evaluation
To investigate agronomic traits, we grew the two varieties in the field. Ten individuals were chosen from each parent after maturation. Different traits include panicle length, filled grain number, total grain number, number of tillers, secondary and primary branches, GNPP, grain length, thickness and width, and thousand kernel weight. After the crossing, the number of panicles and grain per plant were evaluated in the F 2 population. We obtained the GNPP by dividing the total grain number of a plant by the total panicle number. Statistical analyses were performed using a t-test. To analyze the variance and compare the mean differences (p ≤ 0.05), we used SigmaPlot software v. 125 (Systat Software Inc., San Jose, CA, USA) and Duncan's multiple range test, respectively.

Sample Collection, Extraction of the Genomic DNA, and Construction of Segregating Pools
For the BSA-seq analysis, leaves were collected in April 2020 for all 436 F 2 plants, labelled, and kept in −80 • C. After maturation, the grain number from different plants was recorded, the frequency distribution was drawn, and samples were chosen for DNA extraction. The samples for the RNA-seq and metabolome analysis were collected in May 2021 from young panicles, uniform in length (≤2 cm) [32], and disease-free during the harvesting period. Upon collection, the young panicles were snap-frozen, then stored in the freezer at −80 • C for further experiments. The experimental design was made of three biological replicates to reduce errors.
DNA extraction was performed through the cetyltrimethylammonium bromide (CTAB). Briefly, 0.5 g of leaf sample was collected from each offspring F 2 individual deriving from Ce679 and R423 crossing, mixed, ground in liquid nitrogen into a powder form in the 2 mL centrifuge tube using a grinder. Then, 800 µL 2% preheated CTAB extraction buffer (CTAB-4 g; NaCl-16.34 g; 1 M Tris-HCl-20 mL (PH 8.0); 0.5 M EDTA-8 mL; PVP-360-2 g volume to 200 mL (pH 8) re-sterilization, preheated in a water bath to 65 • C for 30 min was added and incubated in Mary's bath at 65 • C for 40 min with intermittent shaking every 10 min. Then, one volume (400 µL) of chloroform-isoamyl alcohol (24:1) was added, thoroughly mixed by inverting the tube for 5 min before centrifugation (12,000× g, 5 min). The suspension was aspirated carefully and transferred to a new 1.5-mL centrifuge tube. Afterward, twice the volume of isopropyl-alcohol was added, mixed gently, and stood at −20 • C for over 30 min. The nucleic acid in the aqueous phase was pelleted after centrifugation (12,000× g, 5 min). A 0.5 mL volume of 70% ethanol was added to the precipitate, and centrifuged (12,000× g, 5 min) after 5 min at room temperature. The above wash was repeated. The sample was air-dried and dissolved in 50 µL sterilized deionized water. DNA concentration was estimated by the Nanodrop 1000 spectrophotometer and run on 1% agarose gel electrophoresis to assess purity. DNA from 30 plants representing high, middle, and low grain numbers were equally mixed to form H, M, and L pools.

Bulk Segregants Analysis Sequencing
We prepared five DNA libraries from the two parents and three constructed pools, followed by sequencing. Briefly, DNA was fragmented by ultrasonication into small pieces of 350 bp, ligation with the adapters, and then purified. Further, the Illumina HiseqTM2000 platform (Beijing Biomarker Biotechnology Co., Beijing, China) was used to sequence the DNA. We filtered data to obtain high-quality reads to ensure successful progress in the subsequent analysis. Data filtering involved trimming the base with low quality, removing the reads with 50% bases with a Q-score less than 10, and those with more than 10% missing bases. The clean reads obtained after filtering were mapped to the reference genome (Oryza_sativa_IRGSP-1.0) using BWA software [33]. Subsequently, we performed SNP calling and annotation using GATK tools and SnpEff software [34,35]. Reads were removed on the reference genome with the Mark Duplicates tool in Picard (http://sourceforge.net/projects/picard/; accessed on 28 October 2020). The local rearrangement and base mass value calibration was carried out to detect SNP and small indels (1-5 bp) using the GATK software package [36].
Further, we used SnpEff software to perform SNP annotation and determine the impacts (synonymous and non-synonymous mutations) of small indels in the genome [35]. The candidate regions of the genome associated with the GNPP were identified, and the differences in allele frequency between bulked pools were performed with the SNP-index algorithm [34,37]. The SNP index was called the scale of short reads containing SNPs different from the reference genome [38]. The ∆(SNP-index) was referred to as the SNPindex difference between H-pool and L-pool, H-pool and M-pool, and M-pool and L-pool and was used to calculate the 1000 permutations in the genome with 95% confidence within the candidate regions of the GNPP [2]. The distribution of the SNP index among the genome within 1 Mb width windows and 1 kb at each step was calculated through the sliding window analysis. The above analysis was carried out through the online platform BMKCloud (http://www.biocloud.com/; accessed on 4 November 2020).

RNA-Sequencing and Gene Profiles Analysis
Fresh young panicles about 2 cm [32] were collected from different plants, and 0.5 g were used for total RNA isolated using a TRIzol ® reagent kit (Invitrogen, Carlsbad, CA, USA) according to the manufacturer's protocol. RNA concentration, purity, and integrity were evaluated using an Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA, USA) and agarose gel electrophoresis. We further generated a cDNA library following the Invitrogen protocol. The polyA selection method by oligo (dT)-attached magnetic beads was used to isolate and enrich mRNA from total RNA. Afterward, the enriched mRNA was fragmented into small pieces of 350 bp. This experiment was performed by adding a fragmentation buffer. The fragments were reverse transcribed into the first-strand cDNA with random primers. The second-stranded cDNA was synthesized by reverse transcriptase and purified with a QiaQuick PCR extraction kit (Qiagen, Venlo, The Netherlands). Subsequently, the cDNA structure was end-repaired, a poly (A) tail was added, and the index adaptor was ligated to prepare hybridization. The ligated products were selected based on the size after running on 2% low-range ultragel electrophoresis (Certified Low Range Agarose, Bio-Rad, Shanghai, China). The PCR amplification was performed for 15 cycles and sequenced using Illumina Hiseq2500 by Gene DeNovo Biotechnology Co., Guangzhou, China.
Quality control of raw data was carried out before data analysis. We generated raw reads in the FASTQ format. Then, low-quality reads and adapters were filtered to obtain the clean reads. Subsequently, we used HISAT2 2.4 with RNA-strandedness and the default parameter [39] to assemble and map high-quality reads to the reference genome (http://plants.ensembl.org/Oryza_sativa_IRGSP-1.0; accessed on 9 May 2021). Fragments per kilo-base of transcript per million (FPKM) were estimated to quantify the gene expression levels [39]. The differentially expressed genes (DEGs) were analyzed through DESEeq2 software [40]. Multiple hypotheses with the p-value thresholds of fold change (FC) ≥ 2 and false discovery rate (FDR) ≤ 0.05 were applied. Gene ontology was performed with the GOSeq R package. Gene function and utilities of the biological system were annotated based on the Kyoto Encyclopedia of Genes and Genomes (KEGG; http://www.genome.jp/kegg; accessed on 20 May 2021) [41]. The KEGG enrichment analysis of DEGs has been performed using KOBAS software [42]. All the analyses above were performed using free online data analysis OmicShare tools (https://www.omicshare.com/tools; accessed on 25 May 2021).

Metabolomes Analysis Sample Preparation and Extraction of Metabolites
The samples were freeze-dried and crushed using a mixer mill (MM 400, Retsch, Haan, Germany) with a zirconia bead for 1.5 min at 30 Hz. In total, 90-105 mg of powder was weighed and extracted overnight at 4 • C with 1 mL of 70% aqueous methanol containing 0.1 mg/L lidocaine for the internal standard. Samples were then centrifuged at 10.000 g for 10 min. The supernatant was absorbed and filtered (SCAA-104, 0.22 µm pore size; ANPEL, Shangai, China, www.anpel.com.cn/; accessed on 25 May 2021) before the liquid chromatography-mass spectrophotometer (LC-MS/MS) analysis. All samples' quality controls (QC) were performed to detect the experiment's reproducibility and ensure that a scientific process met the qualitative and quantitative criteria. The QC samples were analyzed intermittently for the duration of the analytical study to assess the variance observed in the data throughout the sample preparation, data acquisition, and pre-processing steps. Replicate injections should provide comparable data for each injection; however, an analytical variance was be observed, and the replicate QC injections were used to measure this variance across the analytical study.
LIT and triple quadrupole (QQQ) scans were acquired on a triple quadrupole-linear ion trap mass spectrometry (Q TRAP), AB Sciex QTRAP6500 system, equipped with an ESI-Turbo Ion-Spray interface, operating in a positive ion mode and controlled by Analyst 1.6.1 software (AB Sciex). The operation parameters were as follows: ESI source temperature 500 • C; ion spray voltage (IS) 5500 V; curtain gas (CUR) 25 psi; the collisionactivated dissociation (CAD) was set the highest. QQQ scans were acquired as MRM experiments with optimized declustering potential (DP) and collision energy (CE) for each MRM transition. The m/z range was set between 50 and 1000.

Data Processing, Annotation, and Metabolites Identification
The data filtering, peak detection, alignment, and calculations were performed using Analyst 1.6.1 software. Metabolites were identified by searching an internal database and public databases (MassBank, KNApSAcK, HMDB [44], MoTo DB, and METLIN [45]). Then the m/z values, the RT, and the fragmentation patterns were compared with the standards. Gene Denovo Biotechnology Co., Ltd. (Guangzhou, China) conducted the processing and annotation of our metabolomic data. In the preliminary visualization of differences between different groups of samples, we applied the unsupervised dimensionality reduction method principal component analysis (PCA) in samples using R package models for the multivariate analysis (http://www.r-project.org/; accessed on 4 June 2021). For an analysis of differential metabolite profiles, we applied variable importance in the projection (VIP) score with 1 set as the threshold of the (O)PLS model, which allowed us to rank the metabolites that best distinguished between two groups. Differential metabolites were screened between two groups using a t-test as a univariate analysis, and those with p ≤ 0.05, VIP ≥ 1 were considered differentially expressed. Metabolites were mapped to KEGG metabolic pathways (http://www.kegg.jp/kegg/pathway.html; accessed on 9 June 2021) to identify the pathway enrichment and were annotated using the KEGG compound database (https://www.kegg.jp/kegg/compound/; accessed on 30 June 2021). The pathway enrichment analysis identified significantly enriched metabolic pathways or signal transduction pathways in differential metabolites compared with the whole background. The calculated p-value was generated through an FDR correction, taking FDR ≤ 0.05 as a threshold. Pathways fitting this condition were defined as significantly enriched pathways in differential metabolites.

Expression Analysis of Putative Grain Number Genes by Reverse Transcription Quantitative PCR (RT-qPCR)
To evaluate the gene expression, we performed an RT-qPCR. The total RNA was isolated from rice young panicle using the EASYspin RNA Rapid Plant kit (RA106-02, Biomed, www.biomed168.com; accessed on 10 June 2022) according to the manufacturer's instructions. We chose six candidate genes differentially expressed in Ce679 vs. R4233 including Os01g0600900, Os12g0102100, Os01g0580300, Os01g0580500, Os01g0591000, and Os01g0589000. Rice OsActin1 was used as an internal reference gene to normalize the gene expression level. The primer sequences listed in Table S1 were retrieved from https:// biodb.swu.edu.cn/qprimerdb/best-primers-ss; accessed on 10 June 2022. First-strand fulllength cDNAs were synthesized from 2 µg of total RNA using the StarScript II First-strand cDNA Synthesis Mix with gDNA Remover (A224-05, GenStar, Beijing, www.gene-star.com; accessed on 10 June 2022) according to the manufacturer's instruction. The RT-qPCR was carried out using the ChamQ Universal SYBR qPCR Master Mix (Q711-02, Vazyme, China, www.vazyme.com; accessed on 12 June 2022) on the QTOWER3G (Germany) according to the manufacturer's instructions. Six biological repeats were used for the treatment (R4233) and control (Ce679) for gene expression profiles. The reaction was adjusted following the thermal cycling conditions as the initial denaturing temperature, 95 • C for 1 min, followed by 40 cycles, and each cycle consisted of 95 • C for 5 s and 60 • C for 30 s. The gene expression level was calculated by the 2 −∆∆Ct calculation method.

Phenotypic Evaluation of R4233 and Ce679
In all trials in Nanning and Bobai, R4233 demonstrated a high grain number compared to Ce679 and the mean value of the GNPP over the three years was 262.58 and 158.29, respectively. The GNPP in R4233 was nearly 1.66-fold more than that in Ce679. Other important agronomic traits were also investigated in the Nanning site from August to December 2020, as illustrated in Table 1. Our results showed a significant difference (p ≤ 0.01) in flag leaf length, number of secondary branches per the main panicle, grain length, grain number per panicle, length-width ratio, and grain yield per plant R4233 and Ce679. In contrast, no significant differences in panicle length, plant height, tiller number, seed setting ratio, and grain width were identified (Table 1). We evaluated the agronomic parameters in hybrid F1 from the crossing between Ce679 and R4233. The results showed a significant difference in the number of secondary branches per panicle, and also the grain number per panicle at 5% when we compared Ce679 and F1. A similar result was also obtained between R4233 and F1. However, it was higher in F1 and R4233 compared to Ce679 (Table 2). We highlighted that the number of secondary branches per panicle was significantly higher (p ≤ 0.05) than the primary branches in both parents and F1. This result suggested that the secondary branches were the highest contributor to the total branches per panicle. We found that in Ce679, the primary and secondary branches per panicle contributed 17.7 and 82.3% to the total branches per panicle. In R4233 and F1, this contribution was 13.4 and 86.6% and 12.9 and 87.1%, respectively.

Variation of the GNPP in F 2 Population and the Construction of the Segregating Pools
The GNPP of individual plants in the F 2 ranged between 98.11 and 320.20. In Ce679 and R4233, it was found to be 140.90 and 226.05, respectively. As shown in Figure 2, the GNPP was dispersed with two peaks surrounded by the two mean values, indicating that the GNPP as a quantitative trait was controlled by multiple genes (QTL). In total,

Analysis of BSA-Seq Data and Reads Assembly
We constructed five cDNA libraries for BSA-seq and conducted the Illumina HiSeq platform. Through RNA-seq, 263.7 million (M) raw reads were generated. After data filtering, 41.35～71.35 M clean reads (>98%) were obtained from a different sample. Further, the clean reads were mapped to the reference genome, and the proportion of mapped

Analysis of BSA-Seq Data and Reads Assembly
We constructed five cDNA libraries for BSA-seq and conducted the Illu platform. Through RNA-seq, 263.7 million (M) raw reads were generated. A tering, 41.35～71.35 M clean reads (>98%) were obtained from a different sam the clean reads were mapped to the reference genome, and the proportion

Analysis of BSA-Seq Data and Reads Assembly
We constructed five cDNA libraries for BSA-seq and conducted the Illumina HiSeq platform. Through RNA-seq, 263.7 million (M) raw reads were generated. After data filtering, 41.35~71.35 M clean reads (>98%) were obtained from a different sample. Further, the clean reads were mapped to the reference genome, and the proportion of mapped reads to clean reads was 97.85%, 97.57%, 97.48%, 97.77%, and 97.12% in Ce679, R4233, L-pool, M-pool, and H-pool, respectively, with the sequencing depth ranging between 30 and 49 folds. The results showed that the sequencing depth was relatively close to each other in segregating pools and the parents, consistent with the accuracy of the BSA analysis. The one-fold coverage ratio ranged from 91.43 to 94.24% (Table 3) (Tables S3 and S4). The SNPs and indels' densities were evaluated in different chromosomes and the result was similar among the three pools, which varied from 0.00559 to 0.00741 per bp and 0.00127 to 0.00165 per bp. In all pools, the highest SNP and indel density was observed in chromosome 12 and 11, while the lowest was in chromosome 4 (Tables S5 and S6). High-density single nucleotide polymorphisms (SNPs) were used as highly favored makers to analyze genetic diversity and population structure, to construct high-density genetic maps, and to provide genotypes for genome-wide association analysis.

Identification of the Candidate Regions Related to the GNPP
The INDEL and SNP index, which represent the population's parental allele frequency, were used to calculate the candidate regions of the genome related to the GNPP. The ∆indel and ∆SNP-index were associated with the genomic position. As illustrated in the Manhattan plots (Figure 4), the peak regions above the red lines (99%, threshold value) represent the regions where the GNPP may be associated. The ∆SNP-index method identified three, eight, and one candidate regions associated with the GNPP on chromosomes 1 and 10,  (Table S8). The candidate regions identified from these two methods were intersected, and the final association regions were determined. This region spanned 5.38 Mb and contained 739 annotated genes (Table S9). As depicted in Table 4, three associated regions were identified on chromosome 1, covering 0.

Gene Expression Profile Analysis and Identification of Candidate Genes in the Final Associations' Regions
The RNA-sequencing of the six cDNA libraries was generated after filtering a total of 78.77 Mb clean reads with the average GC content of nearly 49.08%; Q20 and Q30 were 97.95% and 94.26%, respectively (Table 5). Further, using HISAT2 software, 93.95 to 94.62% of the clean reads were mapped to the reference genome Oryza sativa L. ssp. Japonica. Our analysis detected 19,625 and 19,666 genes in Ce679 and R4233, respectively, in which 18,750 genes were commonly expressed in Ce679 and R4233 (Figure 5a). A subsequent analysis showed 1562 differentially expressed genes (DEGs) between Ce679 vs. R4233, with 824 upregulated and 738 downregulated (Figure 5b).

Gene Expression Profile Analysis and Identification of Candidate Genes in the Final Associations' Regions
The RNA-sequencing of the six cDNA libraries was generated after filtering a total of 78.77 Mb clean reads with the average GC content of nearly 49.08%; Q20 and Q30 were 97.95% and 94.26%, respectively (Table 5). Further, using HISAT2 software, 93.95 to 94.62% of the clean reads were mapped to the reference genome Oryza sativa L. ssp. Japonica. Our analysis detected 19,625 and 19,666 genes in Ce679 and R4233, respectively, in which 18,750 genes were commonly expressed in Ce679 and R4233 (Figure 5a). A subsequent analysis showed 1562 differentially expressed genes (DEGs) between Ce679 vs. R4233, with 824 upregulated and 738 downregulated (Figure 5b). The BSA-Seq and RNA-Seq results suggested that among all the genes identified, 738 genes were expressed in the young panicle (Table S10), of which 31 genes (more than 4% of the 738 genes) were differentially expressed (Table S11). We investigated GO-enrichment to predict the biological function of different DEG sets and revealed that 31 DEGs belonged to three categories: biological process, cellular component, and molecular function. Most of the DEGs were assigned to metabolic, cellular, and cellular component organization or biogenesis processes of the biological process category (Figure 5d). In addition, it was found that the cell, cell parts, and membrane in the cellular component and binding, catalytic, and toxin activity were enriched in the molecular function category. These 31 DEGs were used for the KEGG analysis, and the results indicated that 11 pathways were significantly enriched (Figure 5c). These pathways contained six DEGs (Os01g0580300, Os01g0580500, Os01g0589000, Os01g0591000, Os01g0600900, and Os12g0102100) ( Table 6). Os01g0580500, known as OsACO7, aminocyclopropane-1-carboxylate oxidase gene, was related to Ethylene biosynthesis.  The BSA-Seq and RNA-Seq results suggested that among all the genes identified, 738 genes were expressed in the young panicle (Table S10), of which 31 genes (more than 4% of the 738 genes) were differentially expressed (Table S11). We investigated GO-enrichment to predict the biological function of different DEG sets and revealed that 31 DEGs belonged to three categories: biological process, cellular component, and molecular function. Most of the DEGs were assigned to metabolic, cellular, and cellular component organization or biogenesis processes of the biological process category (Figure 5d). In addition, it was found that the cell, cell parts, and membrane in the cellular component and binding, catalytic, and toxin activity were enriched in the molecular function category. These 31 DEGs were used for the KEGG analysis, and the results indicated that 11 pathways were significantly enriched (Figure 5c). These pathways contained six DEGs (Os01g0580300, Os01g0580500, Os01g0589000, Os01g0591000, Os01g0600900, and Os12g0102100) ( Table 6). Os01g0580500, known as OsACO7, aminocyclopropane-1-carboxylate oxidase gene, was related to Ethylene biosynthesis.

Metabolites Associated with the Young Panicle of Rice
The principal component analysis (PCA) approach allowed the comparison of metabolite peaks detected through the LC-MS/MS method in the young panicle of Ce679 and R4233. The samples were separated according to PC1 (38.1%) and PC2 (22.6%) ( Figure 6). The PCA plot showed a separation between the two varieties, suggesting a discrepancy in their metabolites in line with the phenotypic difference.

Metabolites Associated with the Young Panicle of Rice
The principal component analysis (PCA) approach allowed the comp tabolite peaks detected through the LC-MS/MS method in the young panicle R4233. The samples were separated according to PC1 (38.1%) and PC2 (22.6 The PCA plot showed a separation between the two varieties, suggesting in their metabolites in line with the phenotypic difference.

Combined Analysis of the Differential Accumulated Metabolites (DAMs) and
We performed a correlation analysis on DAMs and DEGs. Nine quadr were drawn, elucidating the metabolites' variations and their corresponding Pearson correlation coefficient over 0.99, and the correlation coefficient cluste (Figure 7). It showed that quadrants 9 and 7 had more DAMs and DEGs tha rants.
The correlation analysis was carried out on the differentially accumu lites (DAMs) and DEGs. The variations in the metabolites and their corresp with the Pearson correlation coefficient over 0.99 were selected to draw n diagrams and the correlation coefficient clustered heat map. As shown in

Combined Analysis of the Differential Accumulated Metabolites (DAMs) and DEGs
We performed a correlation analysis on DAMs and DEGs. Nine quadrant diagrams were drawn, elucidating the metabolites' variations and their corresponding genes with a Pearson correlation coefficient over 0.99, and the correlation coefficient clustered heat map (Figure 7). It showed that quadrants 9 and 7 had more DAMs and DEGs than other quadrants.  . The x-axis indicates the log2 ratio of genes, and the y-axis the log2 ratio Black dotted lines indicate the threshold. Each gene indicates a metabolite or gene. Bla the unchanged genes or metabolites; green dots indicate DAMs with unchanged g represent DEGs with unchanged metabolites; DEGs and DAMs are shown by red do 9 quadrants from top to bottom and left to right with black dotted lines. The quadra ④ indicate that the expression abundance of metabolites is higher than that of genes ③ and ⑦ indicate that the expression patterns of genes are consistent with the m quadrant ⑤ indicates that the genes and metabolites are not differentially expressed ⑥, ⑧, and ⑨ denote that the expression abundance of metabolites is lower than th By comparing replicated samples from Ce679 and R4233, the differentia metabolites (DEMs) were highlighted. Globally, 1024 metabolites were iden ing 27 downregulated and 44 upregulated in Ce679 vs. R4233 (Figure 8a). pme0008, mws0473, mwsmce257, pmb3042, Lmmn002260, mws1346, and were significantly different in the two varieties (Table S12). We found tha Hmpn005101 was more than three times higher in Ce679 than in R4233, and contents were twice higher in Ce679 than in R4233. Subsequently, the KEG revealed that the categories "global and overviewed maps", "amino acid and "biosynthesis of other secondary metabolites" were the more represen (Figure 8b). . The x-axis indicates the log 2 ratio of genes, and the y-axis the log 2 ratio of metabolites. Black dotted lines indicate the threshold. Each gene indicates a metabolite or gene. Black dots denote the unchanged genes or metabolites; green dots indicate DAMs with unchanged genes; blue dots represent DEGs with unchanged metabolites; DEGs and DAMs are shown by red dots, divided into 9 quadrants from top to bottom and left to right with black dotted lines. The quadrants 1 , 2 , and 4 indicate that the expression abundance of metabolites is higher than that of genes; the quadrants 3 and 7 indicate that the expression patterns of genes are consistent with the metabolites; the quadrant The correlation analysis was carried out on the differentially accumulated metabolites (DAMs) and DEGs. The variations in the metabolites and their corresponding genes with the Pearson correlation coefficient over 0.99 were selected to draw nine quadrant diagrams and the correlation coefficient clustered heat map. As shown in Figure 7, the higher number of DAMs and DEGs were in the seventh and ninth quadrants. A positive correlation between DAMs and DEGs were observed in quadrant 9, while a negative was observed in quadrant 7.
By comparing replicated samples from Ce679 and R4233, the differentially expressed metabolites (DEMs) were highlighted. Globally, 1024 metabolites were identified, including 27 downregulated and 44 upregulated in Ce679 vs. R4233 (Figure 8a). The levels of pme0008, mws0473, mwsmce257, pmb3042, Lmmn002260, mws1346, and Hmpn005101 were significantly different in the two varieties (Table S12). We found that the level of Hmpn005101 was more than three times higher in Ce679 than in R4233, and Lmmn002260 contents were twice higher in Ce679 than in R4233. Subsequently, the KEGG annotation revealed that the categories "global and overviewed maps", "amino acid metabolism", and "biosynthesis of other secondary metabolites" were the more represented pathways (Figure 8b). Hmpn005101 was more than three times higher in Ce679 than in R4233, and Lmmn002260 contents were twice higher in Ce679 than in R4233. Subsequently, the KEGG annotation revealed that the categories "global and overviewed maps", "amino acid metabolism", and "biosynthesis of other secondary metabolites" were the more represented pathways (Figure 8b).  The DEM was significantly enriched in biosynthesis-related KEGG pathways, including the biosynthesis of amino acids, arginine, and proline metabolism (p ≤ 0.05; Figure 9). The biosynthesis of amino acids was the most significantly enriched pathway. The L-Serine, L-Valine, L-Threonine, L-Homoserine, L-Asparagine, L-Glutamine, L-Lysine, DL-2-Aminoadipic acid, L-Citrulline, 2-Isopropylmalic Acid, and 3-Phospho-D-glyceric acid metabolites were found to be related to the biosynthesis of amino acids; however, γ-Aminobutyric acid, 4-Guanidinobutanal, N-Acetylputrescine, Agmatine, 4-Acetamidobutyric acid, and 4-Guanidinobutyric acid were related to the arginine and proline metabolism (Figure 9).

Genes Associated with the GNPP in a Young Panicle
RNA-Seq data from young panicles were mapped to the reference genome Oryza Sativa Japonica (http://ftp.ensemblgenomes.org/pub/plants/release-49/fasta/oryza_sativa/ dna/, accessed on 9 March 2021); 94.61% and 94.54% of reads were successfully mapped from R4233 and Ce679 samples, respectively (Table 7). The analysis of the three replicated samples collected from young panicles in Ce679 and R4233 allowed the identification of 1562 DEGs, of which 824 and 738 were up and downregulated, respectively, in R4233 vs. Ce679 (Figure 10a). Further, these DEGs were significantly enriched for key metabolism-associated KEGG categories, including "global and overview maps," "carbohydrate metabolism", "biosynthesis of other secondary metabolites", "amino acid metabolism", "lipid metabolism", "signal transduction", and "transport and catabolism" (Figure 10b). Among all DEGs, a total of six galactose metabolism-related genes were selected, including STS1, GIF1, Os06g0675700, OsUGE1, RFS2, and RS5. The levels of RFS2 in R4233 were nine-fold more than those in Ce679, while the RS5 expression levels in R4233 were thrice more than those in Ce679 (Table S13).

Combined Analysis of DEGs and DEMs
The heat map showed that the six selected galactose-related genes could be divid into two groups with contrasting metabolites regulation. Group I with three gen (Os01g0580300, Os01g0589000, Os12g0102100), were upregulated in 27 metabolites a down-regulated in 44 metabolites, while group II also with three genes (Os01g06009 Os01g0580500, Os12g0591000) were upregulated in 44 metabolites and downregulated 27 metabolites (Figure 11). A subsequent analysis revealed that the genes in group I we downregulated in R4233 in three replicated samples and upregulated in Ce679, while t group II genes were upregulated in R4233 and downregulated in Ce679 (Figure 12).
In our findings, the biosynthesis of amino acids was the most significantly enrich

Combined Analysis of DEGs and DEMs
The heat map showed that the six selected galactose-related genes could be divided into two groups with contrasting metabolites regulation. Group I with three genes (Os01g0580300, Os01g0589000, Os12g0102100), were upregulated in 27 metabolites and down-regulated in 44 metabolites, while group II also with three genes (Os01g0600900, Os01g0580500, Os12g0591000) were upregulated in 44 metabolites and downregulated in 27 metabolites (Figure 11). A subsequent analysis revealed that the genes in group I were downregulated in R4233 in three replicated samples and upregulated in Ce679, while the group II genes were upregulated in R4233 and downregulated in Ce679 (Figure 12).
Biomolecules 2022, 12, 918 1 - Figure 11. Correlation heat map between 6 DEGs and 71 metabolites; the significant level cated with "*"; *, indicates p < 0.05; **, indicates p < 0.01; unmarked, indicates no significance We carried out the co-expression network analysis (Pearson correlation coeffi 0.8 or ≤−0.8, p-value ≤ 0.05 (Table S14)) of DEMs and DEGs to highlight the relati between DEGs and DEMs in young panicles between Ce679 and R4233. The DEG DEMs in Ce679 vs. R4233 showed that two genes, Os12g0102100 and Os01g058050 12 metabolites were significantly enriched in three metabolic pathways (biosynth secondary metabolites, cysteine and methionine metabolism, fatty acid biosynt Os12g0102100 was related to fatty acid biosynthesis and the main product called m acid (Figure 13a). Os01g0580500 coded for the enzyme 1-aminoclopropane-1-carbo oxidase (OsACO7). Except for fatty acid biosynthesis, the co-expression network of and DEMs in Ce679 vs. R4233 were mainly enriched in the biosynthesis of prima secondary metabolites (e.g., L-Serine, L-Valine, L-Threonine, L-Homoserine, L-Pip Acid, Tryptamine, DL-2-Aminoadipic acid, 2-Isopropylmalic Acid, 3-Phospho-D-g acid, D-Pantothenic Acid, Os01g0580500) ( Figure 13a) and cysteine and methioni tabolism (e.g., L-Serine, L-Homoserine*, L-Methionine Sulfoxide) (Figure 13a). The showed that the GNPP could be affected by the co-expression of DEGs and DEMs r to fatty acid biosynthesis, biosynthesis of the secondary metabolites, and cysteine an thionine metabolism. The canonical correlation analysis showed that the Os01g0 (ACO7) gene had a high correlation with DL-2-Aminoadipic acid and L-Homoseri Os12g0102100 (At3g45770) gene had a high correlation with L-Methionine Sulfoxid ure 13b). In our findings, the biosynthesis of amino acids was the most significantly enriched pathway from the metabolite analysis. Moreover, it was revealed that L-Lysine, L-Threonine, L-Homoserine, L-Serine, L-Asparagine, L-Valine, DL-2-Aminoadipic acid, L-Glutamine, and L-Citrulline increased from Ce679 to R4233. In contrast, the levels of 2-Isopropylmalic acid and 3-Phospho-D-glyceric acid were reduced (Figure 9b, Table S12).
We carried out the co-expression network analysis (Pearson correlation coefficient > 0.8 or ≤−0.8, p-value ≤ 0.05 (Table S14)) of DEMs and DEGs to highlight the relationship between DEGs and DEMs in young panicles between Ce679 and R4233. The DEGs and DEMs in Ce679 vs. R4233 showed that two genes, Os12g0102100 and Os01g0580500, and 12 metabolites were significantly enriched in three metabolic pathways (biosynthesis of secondary metabolites, cysteine and methionine metabolism, fatty acid biosynthesis). Os12g0102100 was related to fatty acid biosynthesis and the main product called myristic acid (Figure 13a). Os01g0580500 coded for the enzyme 1-aminoclopropane-1-carboxylate oxidase (OsACO7). Except for fatty acid biosynthesis, the co-expression network of DEGs and DEMs in Ce679 vs. R4233 were mainly enriched in the biosynthesis of primary and secondary metabolites (e.g., L-Serine, L-Valine, L-Threonine, L-Homoserine, L-Pipecolic Acid, Tryptamine, DL-2-Aminoadipic acid, 2-Isopropylmalic Acid, 3-Phospho-D-glyceric acid, D-Pantothenic Acid, Os01g0580500) ( Figure 13a) and cysteine and methionine metabolism (e.g., L-Serine, L-Homoserine*, L-Methionine Sulfoxide) (Figure 13a). The results showed that the GNPP could be affected by the co-expression of DEGs and DEMs related to fatty acid biosynthesis, biosynthesis of the secondary metabolites, and cysteine and methionine metabolism. The canonical correlation analysis showed that the Os01g0580500 (ACO7) gene had a high correlation with DL-2-Aminoadipic acid and L-Homoserine; the Os12g0102100 (At3g45770) gene had a high correlation with L-Methionine Sulfoxide (Figure 13b).
to fatty acid biosynthesis, biosynthesis of the secondary metabolites, and cysteine and methionine metabolism. The canonical correlation analysis showed that the Os01g0580500 (ACO7) gene had a high correlation with DL-2-Aminoadipic acid and L-Homoserine; the Os12g0102100 (At3g45770) gene had a high correlation with L-Methionine Sulfoxide (Figure 13b).

Validation of Transcriptome Data
We selected six genes, Os01g0600900, Os12g0102100, Os01g0580300, Os01g0580500, Os01g0591000, and Os01g0589000, to analyze their expression pattern in Ce679 vs. R4233 at the panicle initiation stage to validate the transcriptome experiment results (Figure 14). The RT-qPCR results indicated that the selected genes' expression pattern was consistent with the RNA-seq data, having similar expression trends despite the quantitative difference in the expression level.

Validation of Transcriptome Data
We selected six genes, Os01g0600900, Os12g0102100, Os01g0580300, Os01g0580500, Os01g0591000, and Os01g0589000, to analyze their expression pattern in Ce679 vs. R4233 at the panicle initiation stage to validate the transcriptome experiment results (Figure 14). The RT-qPCR results indicated that the selected genes' expression pattern was consistent with the RNA-seq data, having similar expression trends despite the quantitative difference in the expression level.

Discussion
The GNPP is a primary agronomic trait that directly determines rice grain yield. Rice

Discussion
The GNPP is a primary agronomic trait that directly determines rice grain yield. Rice grain constituted one of the principal targets during artificial selection, and improving this trait has been integrated into the selection strategies by breeders and molecular biologists. It is profoundly affected by the panicle architecture-related components, such as the length of the central rachis and the number of primary and secondary rachis branches. The current study investigated two indica rice cultivars: Ce679 (low GNPP) and R4233 (high GNPP). R4233 is a restorer line developed from the successive backcrossing of Ce679. After evaluation, the results showed that it has gained some superior agronomic traits from its parents in addition to Pi9 for disease resistance. The combination of blast disease resistance and grain yield is essential in breeding. Although other traits such as flag leaf length and width that also showed a significant difference in this study are important for photosynthesis, the grain number remains the ultimate trait that can directly impact the food security. Thus, it is reasonable for breeders to seek a cultivar with a better yield.
Since environmentally stable QTLs could be applied in a wide range of circumstances, the segregating population for QTL identification related to the GNPP was developed in the experimental field of Guangxi University and in Bobai Southern China during different growing periods. The quantification of different agronomic traits revealed remarkable differences in the panicle structure in R4233 compared with Ce679. We observed similar results for the GNPP in R4233 regardless of the location, which is important in breeding. The number of secondary branches per panicle was significantly higher in R4233 than the number of primary branches in Ce679 ( Table 2), suggesting that the secondary branches significantly contributed to the total branches and grain number. In our results, secondary branches contributed 82.3%, 86.6%, and 87.1% of the total branches in Ce679, R4233, and F1, respectively. The numbers of primary rachis branches and secondary rachis branches between Ce679 and R4233 were significantly different, but no difference was found concerning the length of the central rachis. Therefore, the GNPP in R4233 was significantly higher than Ce679 (Figure 1). Conversely, the number of primary branches per panicle was not significantly different in F1 compared to Ce679. At the same time, this difference was significant in the number of secondary branches per panicle and GNPP. This result indicated that the development of the secondary branch per panicle might be the primary component that affects the GNPP in Ce679 and R4233, causing the grain number to double in R4233 compared with Ce679. In rice, panicle development is critical in grain production; mainly the transition to the reproductive phase, which involves the transformation of the shoot apical meristem (SAM) into the inflorescence meristem (IM). During this period, several lateral meristems (LM) initiate and grow as primary rachis branches (PRBs), which further produce next-order LM that grow as secondary rachis branches (SRBs). Later, the lateral spikelet will differentiate from the new LM, and the terminal spikelets are converted from rachis branch meristems. Several studies reported a positive regulation of the GNPP through the development of only the secondary rachis branch [11,19,46,47], or simultaneously primary and secondary rachis branch-related genes [13,18,48,49]. However, none of the previously identified genes have been found to directly relate to the number of grains in the present study, suggesting another mechanism contributing to the GNPP.
In a recent study, Guo et al. [24] reported that the completion of mapping and cloning of the Ctb1 gene related to the chilling resistance in rice took over 16 years with conventional breeding and intensive labor [24]. This is because it involved several steps such as fine mapping, map-based cloning, and high-density linkage maps [24]. This procedure has been simplified over the last decades with the development of high-throughput technologies, and the researcher can save much time, labor, and money. Lately, several "omics" techniques, which include, genomics, transcriptomics, proteomics, and metabolomics, have become effective technologies for plant functional genomic and breeding research [50]. Our research took advantage of this technology to accelerate the identification of genes and pathways which take 4 years where more time is needed. Omics research is also undergoing a shift from a single-omics to a large-scale multi-omics approach. In most of the previous research, a single-omics approach was used [51]. For instance, researchers employed BSA-Seq as an effective approach to identify the minor genes with various sequencing depths because of the precision and sensitivity of the sequencing. RNA-Seq was used to identify new genes and SNP loci, measure gene expression levels, and calculate fold changes in DEG. Proteomics was simply to detect gene products and metabolomics measured how proteins are expressed, and the pathway of metabolites, which influence how genes display the biochemical phenotype of the cell. The limitation of the single-omics approach is that it may not help to obtain a deeper understanding of the fundamental biological processes, a more accurate prediction of the response variable, and gain further insight into mechanistic aspects of the system [52]. So, an integration of the different omics approaches is required to envisage overall comprehension of the gene, product, and phenotype under a set of conditions. This approach is gaining more interest and has been successfully used. Recently, in many crops, physiological activity, agronomic traits, responses to biotic and abiotic stress, and yield have been well documented via the use of integrative omic approaches. This robust approach has superseded conventional phenomics, resulting in a formidable tool for crop genetics and breeding sciences [53]. Combining genetic data with prospective phenotyping technologies may offer information on complicated features to help improve crops [54]. The combination of BSA-seq and RNA-seq allowed the identification of the candidate for the agronomic trait [24,28,29,55]. This approach enhanced the accurate identification of gene candidates for the grain number in rice [26], where BSA-Seq alone would identify only the candidate interval. The combination of a transcriptome and metabolome analysis helped to predict molecular mechanisms of genes, and gene networks in crop science. For example, Wang et al. (2019) deciphered the complex response mechanisms involved in heat stress in pepper [56]. The large number of specific response of genes and metabolites highlighted the complex regulatory mechanisms and metabolite networks related to various pathways associated with cold stress after combining a transcriptome and metabolome analysis in tobacco [57] and wheat [54]. Several gene and metabolite networks have been revealed as essential for melatonin-mediated salt tolerance in rice through transcriptome and metabolome investigations [58]. However, the main inconvenience remains the huge datasets generated, which require bioinformatic tools for data mining and organizing [59]. Furthermore, in some case it is necessary to carry out some additional molecular works for functional validation of the candidate gene such as RNAi and Crispr/cas9.
In the current study, we applied a multi-omics approach to investigate the GNPP. BSA-Seq was used to identify QTLs' position. Further, integration of the BSA-Seq analysis and RNA-sequencing to mine QTL related to spikelet grain number at panicle initiation showed thirty-one DEGs, fourteen of which were located on chromosome 1, four on chromosome 5, eight on chromosome 10, and five on chromosome 12. The result reflected the Manhattan plot analysis from BSA-seq, indicating the success of this joint point approach. A KEGG enrichment analysis of these 31 DEGs and 71 differentially enriched metabolites was performed. Two genes, Os12g0102100 and Os01g0580500, and 12 metabolites were significantly enriched in 3 metabolic pathways. Os12g0102100, the alcohol dehydrogenase superfamily zinc-containing protein, is a novel gene, and the contribution to the GNPP is not yet elucidated. This gene was downregulated in the cultivar 4233 and upregulated in Ce679, highlighting its negative effect on the GNPP. The Os01g0580500 code for the enzyme 1-aminoclopropane-1-carboxylate oxidase (OsACO7) is responsible for the final step of the ethylene biosynthesis pathway through the conversion of 1-aminocyclopropane-1-carboxylic acid (ACC) into ethylene in flowering plants under aerobic conditions. This gene was highly expressed in R4233 and downregulated in Ce679, suggesting that this gene positively regulated the GNPP through ethylene synthesis. Increasing evidence indicated that several hormones synthesized by plants, including cytokinin (CK), auxin, abscisic acid (ABA), and ethylene, played a crucial role in developing the panicle and indirectly the GNPP in rice by regulating transcriptional and post-transcriptional activities of GNPP-related genes [6,7,60]. A previous study showed that OsACO7 was linked to ethylene biosynthesis, enhancing the resistance of young rice plants to the infection of blast fungus [61]. Ethylene is a group of plant growth regulators involved in coordinating numerous plant development processes such as germination, growth, ripening, senescence, and biotic and abiotic stress responses. A recent study showed that it also contributed to important agronomic traits in rice, including the regulation of panicle architecture, grain filling rate, and size [60]. Yin and coworkers (2015), revealed that ethylene deficient mutant mhz5/crtiso had smaller panicles, fewer branches, and more excessive tillers than wild-type plants. In addition, several studies reported that the difference in filling rate between the higher and lower spikelets resulted from the level of ethylene in the two parts. A cultivar with a compact panicle had higher ethylene content than lax-panicle rice [62][63][64]. These suggested that the level of ethylene at the panicle initiation stage may positively affect the development of the primary and secondary branches in R4233.
We comprehensively assessed the endogenous metabolites involved in these processes. Our results revealed that OsACO7 interacted with 11 metabolites classified into four groups: amino acids and derivatives (the most prominent group), alkaloids, organic acids, and others sharing two essential pathways. Among the twelve metabolites, ten were found in the biosynthesis of secondary metabolites, suggesting that this pathway contributed substantially to the development of primary and secondary branches of the panicle. This result was in line with Ke et al. (2018), who found that the biosynthesis of secondary metabolites contributed to rice panicle development. Secondary metabolites are generally regarded as indispensable to maintaining normal metabolism and completing the normal life cycle in the plant. Cysteine and methionine metabolism pathways involved three of twelve metabolites: L-Serine, L-Homoserine, and L-Methionine Sulfoxide. Although only three metabolites were identified in this pathway, they seemed to contribute because they can be considered the primary substrate from which ethylene is synthesized ( Figure 15). L-Serine and L-Homoserine were also identified in the biosynthesis of the secondary metabolites' pathway, suggesting that these metabolites were essential in ethylene biosynthesis and the GNPP. Juan and co-authors (2014) reported that ethylene was synthesized from Sadenosylmethionine (SAM) through 1-aminocyclopropane-1-carboxylic acid (ACC), and L-serine was the primary source of one-carbon units for methylation reactions that occurred with the generation of S-adenosylmethionine [65]. Other studies also revealed that ethylene was produced from methionine [66,67]. Although only three metabolites were identified in this pathway, they seemed to contribute because they can be considered the primary substrate from which ethylene is synthesized ( Figure 15). L-Serine and L-Homoserine were also identified in the biosynthesis of the secondary metabolites' pathway, suggesting that these metabolites were essential in ethylene biosynthesis and the GNPP. Juan and co-authors (2014) reported that ethylene was synthesized from S-adenosylmethionine (SAM) through 1-aminocyclopropane-1-carboxylic acid (ACC), and L-serine was the primary source of one-carbon units for methylation reactions that occurred with the generation of S-adenosylmethionine [65]. Other studies also revealed that ethylene was produced from methionine [66,67]. Another essential pathway was fatty acid biosynthesis, in which the gene Os12g0102100 coding for mitochondrial trans-2-enoyl-CoA reductase was implicated in the biosynthesis of myristic acid, also called tetradecanoic acid, which is a long-chain saturated fatty acid. Qin and colleagues (2007) showed that saturated, very-long-chain fatty acids promote the development of cotton fiber and the elongation of cells in arabidopsis through the activation of the biosynthesis of ethylene. A previous study reported that myristic acid is vital in cell regulation because it modifies the number of proteins through acylation and N-myristoylation in the signal transduction cascade [68,69]. The high expression of this gene increases the production of miristic acid, which may interfere with the ethylene production pathway in Ce679, leading to the repression of the development process of secondary branches. The differential regulatory mechanism may coincide with the specific ethylene responses to secondary and primary branches' development, suggesting a possibility for a novel ethylene-GNPP regulatory mechanism in rice.

Conclusions
The present study coupled different omics approaches to identify potential candidate genes for the GNPP, perform the functional analysis, and investigate the overall pathway Another essential pathway was fatty acid biosynthesis, in which the gene Os12g0102100 coding for mitochondrial trans-2-enoyl-CoA reductase was implicated in the biosynthesis of myristic acid, also called tetradecanoic acid, which is a long-chain saturated fatty acid. Qin and colleagues (2007) showed that saturated, very-long-chain fatty acids promote the development of cotton fiber and the elongation of cells in arabidopsis through the activation of the biosynthesis of ethylene. A previous study reported that myristic acid is vital in cell regulation because it modifies the number of proteins through acylation and N-myristoylation in the signal transduction cascade [68,69]. The high expression of this gene increases the production of miristic acid, which may interfere with the ethylene production pathway in Ce679, leading to the repression of the development process of secondary branches. The differential regulatory mechanism may coincide with the specific ethylene responses to secondary and primary branches' development, suggesting a possibility for a novel ethylene-GNPP regulatory mechanism in rice.

Conclusions
The present study coupled different omics approaches to identify potential candidate genes for the GNPP, perform the functional analysis, and investigate the overall pathway in which there is a possible interaction. Our findings revealed that two main QTLs controlled the frequency distribution of the GNPP in the F 2 population deriving from the crossing between Ce679 (low GNPP) and R4233 (high GNPP) in these cultivars. This result indicates that Os12g0102100 and Os01g0580500 might be crucial at panicle initiation, and the GNPP was enhanced through ethylene biosynthesis in rice, which has never been reported. The different phenotypic changes may reflect plant-specific responses to ethylene. Further validation studies, including genetic transformation, RNA interference, and overexpression, may disclose the physiological mechanism which regulates the GNPP in rice. This research showed that a multi-omics analysis is a complementary approach that could help to find important candidate genes for functional investigation. This robust approach is a formidable tool for crop genetics and breeding sciences.
Supplementary Materials: The following supporting information can be downloaded at: https://www. mdpi.com/article/10.3390/biom12070918/s1. Table S1: Primer sequence for qRT-PCR; Table S2: Construction of bulk pools based on grain number per panicle (GNPP) marked per plant; Table S3: SNP detection and annotation through BSA-Seq analysis; Table S4: Indel detection and annotation of BSA-seq analysis; Table S5: SNPs density in different pools; Table S6: Indels density in different pools; Table S7: The genes annotated in the candidate regions associated with the GNPP identified by the ∆SNP-index method; Table S8: The genes annotated in the candidate regions associated with the GNPP identified by the ∆Indel-index method; Table S9: The genes annotated in the final candidate region associated with the GNPP; Table S10: 738 expressed genes in RNA-seq result from BSA-seq; Table S11: 31 DEGs expression; Table S12: The differential expression metabolites were between Ce679 and R4233; Table S13: Information about DEGs of galactose-related gene; Table S14: The information of co-expression network of DEGs and DEMs.