Transcriptome Dataset of Soybean ( Glycine max ) Grown under Phosphorus-Deficient and-Sufficient Conditions

This data descriptor introduces the dataset of the transcriptome of low-phosphorus tolerant soybean (Glycine max) variety NN94-156 under phosphorus-deficient and -sufficient conditions. This data is comprised of the transcriptome datasets (four libraries) acquired from roots and leaves of the soybean plants challenged with low-phosphorus, which allows further analysis whether systemic tolerance response to low phosphorus stress occurred. We describe the detailed procedure of how plants were prepared and treated and how the data were generated and pre-processed. Further analyses of this data would be helpful to improve our understanding of molecular mechanisms of low-phosphorus stress in soybean. Data Set: https://www.ncbi.nlm.nih.gov/sra/?term=SRP097715 Data Set License: There is no specific license.


Introduction
Phosphorus (P) is an essential component of fertilizers in a soil that is critical for plant production.Because there is no known chemical or technological substitute for P in either natural ecosystems or in the agroecosystems and because the mobility of P in soil is very limited [1], P has been a global constraint on crop productivity in some crop-growing areas, especially regions without sufficient P content in soil and financial support to obtain P-containing fertilizers.The use of fertilizers can provide season-long P for crop growth, but it is not an economically effective method for farmers.The development and use of crops with higher P-use efficiency is an economical and environmentally friendly method of providing sustainable crop production [2].
Soybean (Glycine max (L.) Merri) is one of the most important crops, providing about half of the global demand for vegetable oils and proteins.It has been demonstrated that the growth and development of soybean require more P compared with other crops such as rice, corn, and wheat [3].In soybean, low-P stress may decrease soybean nodule development, increase flower/pod abscission, and impair overall plant growth, consequently limiting yield and seed quality [4][5][6][7].Thus, low-P stress is more problematic than other nutrient deficiencies or toxicities in soybean [8].
Soybean tolerance to low-P stress is a complex trait involved in a number of genes, some of these genes may have small effects.Knowledge of a genetic and molecular basis of soybean resistant to low-P stress, thus far, has been obtained from the identification of a number of quantitative trait loci (QTLs) associated with P-efficiency [4,5,9] and a major gene, GmACP1, encoding an acid phosphatase involved in regulating P efficiency [5].Despite great efforts, it remains challenging to pinpoint P-efficiency genes underlying previously identified QTLs that have a relatively large confidence interval.Here, we expanded our previous studies [4,5,9] by conducting a transcriptome analysis of a low-P tolerant Nannong 94-156, under phosphorus-deficient and -sufficient conditions.Our objective was to better understand the genetic and molecular basis of low-P tolerance in soybean.Further studies might functionally identify the candidate genes and introduce the low-P tolerance genes into soybean.

Result
A total of four RNA libraries (Roots+P, Leaves+P, Roots−P, and Leaves−P) for four conditions were prepared and sequenced.Transcriptome sequencing generated approximate 21.1 million (M), 23.5 M, 21.3 M, and 22.3 M reads for Roots+P, Leaves+P, Roots−P, and Leaves−P conditions, respectively.In this dataset, 94% of the samples had quality scores of greater than 30.Preprocess analyses using TopHat [10] showed that approximately 77%-86% of high-quality reads were mapped to unique locations on the reference soybean genome.The sequencing and preprocessing results are summarized in Table 1.An initial comparative analysis of the transcriptome dataset has revealed that acid phosphatases might be involved in enhancing P efficiency in low-P tolerance in soybean, and gene expression analysis and functional analysis has indicated the robustness of the dataset [11].However, further analyses using the data is needed-for example, identifying the underlying alternative splicing genes and regulatory network as well as identifying the differentially expressed genes systemically involved in low-P tolerance in roots and leaves-which will further improve our comprehensive understanding of the molecular mechanism underlying plant adaptations to phosphate deficiency.

Archived Data Accessible to Users
This is the first publicly available dataset of soybean transcriptome change to low-P stress.Filtered sequence reads for the four libraries were submitted to the National Center for Biotechnology Information (NCBI, https://www.ncbi.nlm.nih.gov/)sequence read archive (SRA) database in compressed fastq format.The sequencing data is available with accession numbers SRR5281855-58.This dataset could be optionally retrieved using an SRA Toolkit (http://www.ncbi.nlm.nih.gov/sra) and could be transformed into tool-specified data formats.Further analysis of the data set could be performed using software tools and databases summarized at OmicTools (https://omictools.com).

Plant Material and Treatment
Soybean cultivar, NN94-156 (B20), is a low-P tolerance soybean that has been used for linkage mapping in our previous studies, and several QTLs with a high explanation of phenotypic variation have been identified [4,5].Soybean treatment for phosphorus-deficient and -sufficient conditions were followed as previously described [5,11].Briefly, seeds were surface-sterilized with 0.5% sodium hypochlorite for no more than 4 minutes, rinsed twice with sterile water, and then germinated in sterile vermiculite.The seedlings with fully expanded cotyledons were transferred into a 60-hole hydroponic tank (70 × 50 × 30 cm) [12] filled with 50% Hoagland's nutrient solution [13] (pH 5.8) supplemented with 500 µM (+P, KH 2 PO 4 ), with two plants per hole.Three days after transplanting, 60 plants were transferred to a separate tank containing 50% Hoagland's nutrient solution lacking P supply (−P, 5 µM P), and the remaining 60 plants remained in +P solution as controls.Planting more plants than actually needed for the RNA-seq assays allows us to select phenotypically identical plants for sample collection, which is helpful in reducing variation between individuals.All plants were placed in the hydroponics box using a completely randomized block design.The used solution was replaced with the corresponding fresh solution every three days.Whole roots and the uppermost mature trifoliate leaf were separately collected from controls and treated plants, respectively, at 1, 3, 7, and 14 days after transferring for treatment and flash frozen in liquid nitrogen and stored at −70 • C in a refrigerator until they were used.We chose these four time points because we found that GmACP1 [5], an acid phosphatase-encoding gene that contributes to soybean tolerance to low-phosphorus stress, was significantly upregulated after treatment at all of the four time points, with the most dramatic difference occurring at 7 days after treatment (data not shown).To capture a maximum of transcriptome change associated with low-P tolerance, we collected the tissues from treatments and controls, respectively, at the four time points and pooled them for RNA-seq assay.Tissues collected from three phenotypically identical plants at each time were pooled and powdered, and an equal amount of tissues (by weight) from four-time points per condition were pooled for RNA extraction.Therefore, a total of four samples including roots (+P), roots (−P), leaves (+P), and leaves (−P) were prepared.All plants were grown under controlled conditions (10 h light/14 h dark, day/night temperature of 28/20 • C) in a growth chamber.

RNA Isolation, Library Preparation, and RNA Sequencing
Total RNA was extracted using Trizol Kit according to manufacturer's protocol (Promega, Fitchburg, WI, USA).Library construction, RNA integrity, purity, and concentration were assessed using an Agilent 2100 Bioanalyzer, (Agilent Technologies, Santa Clara, CA, USA).Sequencing libraries were prepared from 1 µg RNA per sample using NEBNext ® Ultra™ RNA Library Prep Kit for Illumina ® (New England Biolabs, Ipswich, MA, USA) and NEBNext ® Multiplex Oligos for Illumina Kit (New England Biolabs, Ipswich, MA, USA) as previously described [14].The final quantified libraries were sequenced using Illumina HiSeq™ 2500, and 100-bp paired-end raw reads were generated.

Data Preprocessing
The quality of all reads was accessed by running the FastQC program (version 0.11.5, http://www.bioinformatics.babraham.ac.uk/projects/fastqc/).For quality control of raw reads, a user preference-based Perl program was written to select clean reads by removing low-quality sequences (there were more than 50% bases with quality lower than 20 in one sequence), reads with more than 5% N bases (bases unknown) and reads containing adaptor sequences.Alternatively, similar results in quality control could also be obtained by using several traditional bioinformatics tools, such as Trimmomatic [15], FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/), and cutadapt [16].The clean reads from each library were aligned to the Williams 82 reference soybean genome (Wm82.a2.v1) using TopHat [10] with minimum intron size i = 30 and the maximum intron size I = 15000 as previously described [17], and the rest of parameters were set as default.The identification of differentially expressed genes could be achieved using one of the following toolkits, such as cufflinks [18], DESeq [19], and EdgeR [20] following the users' manual.For those who are not familiar with command-line operations, these bioinformatics tools could also be run in user-friendly web interfaces, such as Galaxy (https://usegalaxy.org)and Discovery Environment at CyVerse (http://www.cyverse.org).

Table 1 .
Summary of sequencing reads and transcriptome statistics.