Proteomic Analysis and Identification of Possible Allergenic Proteins in Mature Pollen of Populus tomentosa

Pollen grains from Populus tomentosa, a widely cultivated tree in northern area of China, are considered to be an important aeroallergen causing severe allergic diseases. To gain insight into their allergenic components, mature Populus tomentosa pollen proteins were analyzed by two-dimensional gel electrophoresis (2-DE) and matrix-assisted laser desorption/ionization-time of flight mass spectrometry (MALDI-TOF/TOF MS). A total of 412 spots from mature pollen were resolved on pH 4–7 immobilized pH gradient (IPG) strips and 159 distinct proteins were identified from 242 spots analyzed. The identified proteins were categorized based on their functional role in the pollen, which included proteins involved in energy regulation, protein fate, protein synthesis and processing, metabolism, defense/stress responses, development and other functional categories. Moreover, among the identified proteins, 27 proteins were identified as putative allergens using the Structural Database of Allergenic Proteins (SDAP) tool and Allergen Online. The expression patterns of these putative allergen genes indicate that several of these genes are highly expressed in pollen. The identified putative allergens have the potential to improve specific diagnosis and can be used to develop vaccines for immunotherapy against poplar pollen allergy.


Introduction
Pollen grains are the dispersal agents of sperm cells and play a vital role in sexual reproduction of higher plants. After release from an anther, pollen grains are carried by wind, insects and other agents to the stigma of a carpel, with the primary function of delivering sperm cells to the female gametophyte via the formation of a pollen tube, and subsequent seed and fruit production [1,2]. During the development of pollen, microsporogenesis and microgametogenesis, a large number of genes are coordinated and expressed in different tissues of an anther with designated roles in cell signaling, cytoskeleton formation, cell wall metabolism and vesicle transport [3,4]. Pollen grains have proteins from 2.5% to 61% by dry mass and some of those proteins act as allergens upon inhalation [5]. To date, a large amount of information on pollen allergens in diverse plant species can be found in allergen databases such as the Allergome [6] or Structural Database of Allergenic Proteins (SDAP) [7]. However, pollen allergens are restricted to few protein families and show distinct

Proteomic Maps of P. tomentosa Mature Pollen
Total proteins extracted from mature P. tomentosa pollen were subjected to 2-DE stained with CCB using pH 4-7 IPG strips, then the 2-DE gels were aligned and matched. A total of 412 reproducible protein spots were detected in the gels and were resolved at the molecular weight (MW) range from 5 to 120 kDa and isoelectric point (pI) values range from 4 to 7 ( Figure 2). All detected protein spots were processed by automated in gel tryptic digestion and MALDI-TOF/TOF MS/MS analysis. 242 protein spots, representing 159 different proteins, were subjected to BLAST against the Populus trichocarpa proteome (v3.0), which was downloaded from the Phytozome 10.3.1 website [20] However, it is noted that there were 83 proteins associated with different spots. The calculated MW of the identified proteins ranged from approximately 8.94 to 199.37 kDa and pI values range from approximately 4.11 to 12.21, the PACid number and corresponding P. trichocarpa gene locus were also listed in Table S1. Most of them are close to the experimental data as judged from the location of the spots on the 2-DE gels. However, it should be noted that the identified proteins did not always have a one-to-one correlation with the spots on the gels and the deviations in molecular mass and pI, which may have resulted from a number of factors. For example, polypeptide variants that present in different spots on the gel, but encoded by the same gene, post-translational modification of the proteins in vivo (e.g., phosphorylation, glycosylation, acetylation, methylation or other groups) that do not significantly affect the MW of a protein but induce a pI shift on the protein spot on the gel [21][22][23], protein translation from alternatively spliced mRNAs [24], partial synthesis of proteins during pollen maturation [25] or chemical modification of the proteins during sample preparation. The pollen grain suspension with the liquid germination medium was spread on the dishes. After incubation at 21 • C for 12 and 48 h, the pollen germination rate was determined under a light microscope. The experiments were repeated three times and three replicates (dishes) were carried out.

Proteomic Maps of P. tomentosa Mature Pollen
Total proteins extracted from mature P. tomentosa pollen were subjected to 2-DE stained with CCB using pH 4-7 IPG strips, then the 2-DE gels were aligned and matched. A total of 412 reproducible protein spots were detected in the gels and were resolved at the molecular weight (M W ) range from 5 to 120 kDa and isoelectric point (pI) values range from 4 to 7 ( Figure 2). All detected protein spots were processed by automated in gel tryptic digestion and MALDI-TOF/TOF MS/MS analysis. 242 protein spots, representing 159 different proteins, were subjected to BLAST against the Populus trichocarpa proteome (v3.0), which was downloaded from the Phytozome 10.3.1 website [20] However, it is noted that there were 83 proteins associated with different spots. The calculated M W of the identified proteins ranged from approximately 8.94 to 199.37 kDa and pI values range from approximately 4.11 to 12.21, the PACid number and corresponding P. trichocarpa gene locus were also listed in Table S1. Most of them are close to the experimental data as judged from the location of the spots on the 2-DE gels. However, it should be noted that the identified proteins did not always have a one-to-one correlation with the spots on the gels and the deviations in molecular mass and pI, which may have resulted from a number of factors. For example, polypeptide variants that present in different spots on the gel, but encoded by the same gene, post-translational modification of the proteins in vivo (e.g., phosphorylation, glycosylation, acetylation, methylation or other groups) that do not significantly affect the M W of a protein but induce a pI shift on the protein spot on the gel [21][22][23], protein translation from alternatively spliced mRNAs [24], partial synthesis of proteins during pollen maturation [25] or chemical modification of the proteins during sample preparation.

Functional Classification of Identified Proteins
To assign functional information to the identified proteins, we classified them into functional categories according to the gene sequences and a homologic comparison with other known proteins [2,12,26]. The 242 identified proteins were classified into 12 different functional groups ( Figure 3, Table S1). Approximately 70% of them were classified into four categories, including energy (23.14%), protein fate (17.77%), protein synthesis and processing (16.12%) and metabolism (12.39%),  Table S1.

Functional Classification of Identified Proteins
To assign functional information to the identified proteins, we classified them into functional categories according to the gene sequences and a homologic comparison with other known proteins [2,12,26]. The 242 identified proteins were classified into 12 different functional groups ( Figure 3, Table S1). Approximately 70% of them were classified into four categories, including energy (23.14%), protein fate (17.77%), protein synthesis and processing (16.12%) and metabolism (12.39%), suggesting a special requirement of these categories of proteins for energy and general metabolism, such as protein synthesis, oxidative phosphorylation, carbohydrate metabolism and sugar metabolism.
These proteins were also reported in rice and Arabidopsis pollen [1,11,23]. Although the presence of a high percentage of proteins related to energy metabolism correlates well with the large number of mitochondria observed in mature P. tomentosa pollen, it is well known that the pollen germination and tube growth are high-energy-requiring processes that require most of the proteins for these processes, the P. tomentosa mature pollen was not germinated. The other functional categories were defense/stress responses (7.02%), development (5.37%), cytoskeleton (2.07%), cell fate (2.07%), signal transduction (1.65%), transport (0.83%), cell structure (0.41%) and unclassified proteins (11.16%). There were 27 proteins out of the 242 that could not be functionally classified as they were not observed to contain any known conserved domains. suggesting a special requirement of these categories of proteins for energy and general metabolism, such as protein synthesis, oxidative phosphorylation, carbohydrate metabolism and sugar metabolism. These proteins were also reported in rice and Arabidopsis pollen [1,11,23]. Although the presence of a high percentage of proteins related to energy metabolism correlates well with the large number of mitochondria observed in mature P. tomentosa pollen, it is well known that the pollen germination and tube growth are high-energy-requiring processes that require most of the proteins for these processes, the P. tomentosa mature pollen was not germinated. The other functional categories were defense/stress responses (7.02%), development (5.37%), cytoskeleton (2.07%), cell fate (2.07%), signal transduction (1.65%), transport (0.83%), cell structure (0.41%) and unclassified proteins (11.16%). There were 27 proteins out of the 242 that could not be functionally classified as they were not observed to contain any known conserved domains. In addition, gene ontology (GO) assignments were performed to functionally classify these proteins, which provide a dynamic, controlled vocabulary, and hierarchical relationships for the representation of information on the biological process, molecular function and cellular component ( Figure 4). In terms of biological process, metabolic process (GO: 0008152, 88 proteins) was the most represented GO term, followed by cellular process (GO: 0009987, 76 proteins) and primary metabolic process (GO: 0044238, 59 proteins). In molecular function, proteins with catalytic activity ( In addition, gene ontology (GO) assignments were performed to functionally classify these proteins, which provide a dynamic, controlled vocabulary, and hierarchical relationships for the representation of information on the biological process, molecular function and cellular component ( Figure 4). In terms of biological process, metabolic process (GO: 0008152, 88 proteins) was the most represented GO term, followed by cellular process (GO: 0009987, 76 proteins) and primary metabolic process (GO: 0044238, 59 proteins). In molecular function, proteins with catalytic activity (

Prediction of Allergens in P. tomentosa Mature Pollen
Pollens are one of the leading causes of respiratory allergic sensitizations [27]. In spring, poplars release a lot of pollen that might cause the allergenic response. To date, many sequences and structures of pollen allergenic proteins have been characterized and restricted to few protein families [8]. They share common characteristics that contribute to their ability to bind IgE and trigger an

Prediction of Allergens in P. tomentosa Mature Pollen
Pollens are one of the leading causes of respiratory allergic sensitizations [27]. In spring, poplars release a lot of pollen that might cause the allergenic response. To date, many sequences and structures of pollen allergenic proteins have been characterized and restricted to few protein families [8]. They share common characteristics that contribute to their ability to bind IgE and trigger an allergic reaction [28]. To identify the potential allergen proteins in P. tomentosa mature pollen, the 242 identified proteins were predicted with SDAP tool and Allergen Online. SDAP is a web server that can provide rapid, cross-referenced access to the sequences, structures and IgE epitopes of allergenic proteins [7], and Allergen Online is a better and more frequently updated website. In this study, 27 proteins identified in poplar pollen were predicted as putative allergens (Table 1). For example, eleven heat shock protein 70 (Hsp70) (Spots 4, 5, 7, 8, 9, 10, 11, 12, 13, 21 and 31) and four small Hsps (spots 172, 173, 217 and 218) were identified as corresponding to allergenic molecules. Hsp70 have been demonstrated to bind to human IgE from allergenic patients to cystic echinococcosis [29], corn and barley [30], and antigenic cross-activity of Hsp70 with a 70 kDa component was proved by amino acid sequence alignment in Penicillium citrinum [31]. Class I small heat shock protein (Hsp) detected on a 2D gel have reported that it is one of allergens in soybean [32]. Spots 25, 49, 50, 53, 54, 59, 62, 71, 72, 222, 241 correspond to enolase. It is a ubiquitous glycolytic enzyme that was observed as highly conserved allergens from various fungi and latex, such as Cladosporium herbarum (Cla h 6), Alternaria alternate (Alt a 6), Curvularia lunata (rCur l 2) [33,34]. Spots 156, 159 and 160 were identified as pollen Ole e 1 allergen, which was a well characterized allergenic protein with the relevant (24-34%) homologous amino acid sequence among pollen proteins from maize, tomato, ryegrass, birch, rice, Arabidopsis etc., and was surmised to control pregermination and pollen tube emergence [35]. Spots 55 and 210 were identified as thioredoxin, which is known to act as a novel cross-reactive cereal allergen family that might contribute to the symptoms of baker's asthma and might be related to grass pollen allergy [36]. Weichel et al. [36] identified wheat thioredoxin hB (Tri a 25) by screening a cDNA phage display library against immobilized serum IgE from 8 bakers with occupational asthma. It shared high homology with maize thioredoxin (ZmTRXh1 and ZmTRXh2) and human thioredoxin and included cross-reactive members that might be of relevance for patients occupationally exposed to inhalant allergens. Spot 197 and 202 respectively correspond to profilin 3 and 5. Profilins are ubiquitous proteins in the vegetal kingdom that act as pan-allergens, and are actin-binding proteins present in all eukaryotic cells. The family of profilin is one of the main causes of cross-reactivity between pollen and vegetable food [37], and their clinical allergenicity, albeit variable, is well recognized both in respiratory and food allergy [38]. Plant profilins present a highly conserved structure that provokes multiple positive sIgE responses in sensitized patients [39]. Spot 145 correspond to triosephosphate isomerase, which described as allergen in wheat, latex and lychee [40] and the remaining allergenic proteins predicted caused allergic reactions still need to be further studied. More importantly, the IgE antibody binding properties of these allergenic proteins should be analyzed using immunoblotting with sera from patients with pollen allergy. This could have confirmed the IgE recognition of the putative allergens as well as confirmed the cross-reactivity to pollen allergens from other species [40]. In our previous study, we identified 28 possible allergenic proteins in P. deltoides CL. '2KEN8' [26]. Here, we compared the overlap of these candidate possible allergenic proteins between P. tomentosa and '2KEN8' ( Figure S1) and the results showed that 16 possible allergenic proteins were present in both P. tomentosa and '2KEN8' mature pollen, such as Hsp, enolase, pollen Ole e 1, profilin and thioredoxin. In addition to the 16 allergenic proteins, they have 23 different putative allergens. These differences may be due to the methods of protein extraction, spot selection for analysis and the poplar species.

Expression Profiles of the Predicted Allergen Genes in Different Tissues
To examine whether the predicted pollen-allergen genes presently characterized are expressed in poplar and to study their expression patterns, Zhang et al. [26] previously showed that the global expression patterns of 28 predicted poplar allergen genes (including 16 putative allergen genes presented in both P. tomentosa and '2KEN8' mature pollen) across various tissues based on an Affymetrix microarray data (GSE21481). Among the 16 putative allergen genes, two genes (Potri.001G392400.1 and Potri.011G111300.1) corresponding to spots 156, 159 and 160 (Pollen Ole e 1 allergen and extensin family protein) had high transcript levels in male catkin, suggesting their specific expression in pollen. In this study, we analyzed the expression patterns of the 11 putative allergen genes that only presented in P. tomentosa across various tissues based on this microarray data. However, five poplar allergen genes (Potri.005G015100.1, Potri.013G009500.1, Potri.012G090900.1 and Potri.013G089200.1) did not agree with the corresponding data. The reasons may be due to the improvement of the poplar genome, some genes with the incorrect functional annotation have been removed or others. The expression patterns of the other 7 poplar allergen genes are shown in Figure 5. Three poplar allergen genes corresponding to spots 9 (Potri.010G205700.1, Hsp 70 family protein), 48 (Potri.002G189900.1, Aldehyde dehydrogenase 2B7) and 194 (Potri.016G024700.1, Calmodulin 6) had high transcript levels in the different tissues. One gene (Potri.003G143600.1) corresponding to spots 4, 8 and 21 (Hsp 70 family protein) was highly expressed in RFF and AxB, the other gene (Potri.009G022300.1) corresponding to spots 215 (Cystatin B) was highly expressed in ApB, ML, RTC, RFF, SE and G43h. These potential allergenic protein genes might play important roles in not only reproduction but also vegetative development. Thus, our data contribute to the identification of new pollen allergenic proteins.

Expression Profiles of the Predicted Allergen Genes in Different Tissues
To examine whether the predicted pollen-allergen genes presently characterized are expressed in poplar and to study their expression patterns, Zhang et al. [26] previously showed that the global expression patterns of 28 predicted poplar allergen genes (including 16 putative allergen genes presented in both P. tomentosa and '2KEN8' mature pollen) across various tissues based on an Affymetrix microarray data (GSE21481). Among the 16 putative allergen genes, two genes (Potri.001G392400.1 and Potri.011G111300.1) corresponding to spots 156, 159 and 160 (Pollen Ole e 1 allergen and extensin family protein) had high transcript levels in male catkin, suggesting their specific expression in pollen. In this study, we analyzed the expression patterns of the 11 putative allergen genes that only presented in P. tomentosa across various tissues based on this microarray data. However, five poplar allergen genes (Potri.005G015100.1, Potri.013G009500.1, Potri.012G090900.1 and Potri.013G089200.1) did not agree with the corresponding data. The reasons may be due to the improvement of the poplar genome, some genes with the incorrect functional annotation have been removed or others. The expression patterns of the other 7 poplar allergen genes are shown in Figure 5. Three poplar allergen genes corresponding to spots 9 (Potri.010G205700.1, Hsp 70 family protein), 48 (Potri.002G189900.1, Aldehyde dehydrogenase 2B7) and 194 (Potri.016G024700.1, Calmodulin 6) had high transcript levels in the different tissues. One gene (Potri.003G143600.1) corresponding to spots 4, 8 and 21 (Hsp 70 family protein) was highly expressed in RFF and AxB, the other gene (Potri.009G022300.1) corresponding to spots 215 (Cystatin B) was highly expressed in ApB, ML, RTC, RFF, SE and G43h. These potential allergenic protein genes might play important roles in not only reproduction but also vegetative development. Thus, our data contribute to the identification of new pollen allergenic proteins.  To further confirm the expression profiles of the presently characterized predicted allergen genes and verify the reliability of the microarray data, qRT-PCR analysis was performed on root, stem, leaf and pollen for 9 genes, which had high relative expression levels based on microarray data ( Figure 6). Meanwhile, the root, stem and leaf were used as control tissues for study the tissue specific expression. In this study, qRT-PCR results show that the Potri.001G392400.1 and Potri.011G111300.1 corresponding to spots 156, 159 and 160 (Pollen Ole e 1 allergen and extensin family protein) were highly expressed in pollen, and microarray data show that these genes were highly expressed in male catkin [26]. Three poplar putative allergen genes (Potri.010G205700.1, Potri.002G189900.1 and Potri.016G024700.1) had high transcript levels in the different tissues. However, some spots were not consistent with microarray data, the reasons may be that pollen used in qRT-PCR was purer than the male catkin used in the microarray analysis in tissue level, different poplar species and others. In general, the present qRT-PCR results were in good agreement with the microarray data sets analyzed in this study.
To further confirm the expression profiles of the presently characterized predicted allergen genes and verify the reliability of the microarray data, qRT-PCR analysis was performed on root, stem, leaf and pollen for 9 genes, which had high relative expression levels based on microarray data ( Figure 6). Meanwhile, the root, stem and leaf were used as control tissues for study the tissue specific expression. In this study, qRT-PCR results show that the Potri.001G392400.1 and Potri.011G111300.1 corresponding to spots 156, 159 and 160 (Pollen Ole e 1 allergen and extensin family protein) were highly expressed in pollen, and microarray data show that these genes were highly expressed in male catkin [26]. Three poplar putative allergen genes (Potri.010G205700.1, Potri.002G189900.1 and Potri.016G024700.1) had high transcript levels in the different tissues. However, some spots were not consistent with microarray data, the reasons may be that pollen used in qRT-PCR was purer than the male catkin used in the microarray analysis in tissue level, different poplar species and others. In general, the present qRT-PCR results were in good agreement with the microarray data sets analyzed in this study. Figure 6. Expression patterns of putative allergen genes in different tissues. Nine predicted allergen protein genes were randomly selected and the expression levels in leaf, stem, root and pollen were analyzed using qRT-PCR. The error bars were calculated from three replicates.

Plant Materials and Pollen Isolation
For biological replicates, three uniformly developed flowering branches were collected from one genotype of P. tomentosa in a nursery of Chinese Academy of Forestry, and then transferred to buckets filling with water and cultured in a greenhouse at average temperature of 22 °C with a relative humidity of 70-75% under 16 h light/18 h darkness photocycle conditions. Mature pollen grains were collected from freshly anther-dehisced flowers by shaking the tassel on a glass petri dish, dried at 37 °C, and any debris removed with a needle. Pollen samples were used immediately or pooled in a tube, then frozen in liquid nitrogen and stored at −80 °C until further study.

In Vitro Pollen Germination Assay
The viability of each pollen sample was tested using the "thin liquid layer" germination methods with some modifications [41]. Briefly, pollen grains were sowed on the dishes containing the liquid germination medium and incubated at 21 °C in the dark. The liquid germination medium was composed of 15% (w/v) sucrose, 100 mg/L H3BO3, 300 mg/L CaCl2, 200 mg/L MgSO4, 100 mg/L KNO3, and the pH was adjusted to 6.0. After incubation for 2, 4, 8, 12, 24, and 48 h, the rates of pollen grains were calculated to determine under a light microscope when the pollen tube grows longer Figure 6. Expression patterns of putative allergen genes in different tissues. Nine predicted allergen protein genes were randomly selected and the expression levels in leaf, stem, root and pollen were analyzed using qRT-PCR. The error bars were calculated from three replicates.

Plant Materials and Pollen Isolation
For biological replicates, three uniformly developed flowering branches were collected from one genotype of P. tomentosa in a nursery of Chinese Academy of Forestry, and then transferred to buckets filling with water and cultured in a greenhouse at average temperature of 22 • C with a relative humidity of 70-75% under 16 h light/18 h darkness photocycle conditions. Mature pollen grains were collected from freshly anther-dehisced flowers by shaking the tassel on a glass petri dish, dried at 37 • C, and any debris removed with a needle. Pollen samples were used immediately or pooled in a tube, then frozen in liquid nitrogen and stored at −80 • C until further study.

In Vitro Pollen Germination Assay
The viability of each pollen sample was tested using the "thin liquid layer" germination methods with some modifications [41]. Briefly, pollen grains were sowed on the dishes containing the liquid germination medium and incubated at 21 • C in the dark. The liquid germination medium was composed of 15% (w/v) sucrose, 100 mg/L H 3 BO 3 , 300 mg/L CaCl 2 , 200 mg/L MgSO 4 , 100 mg/L KNO 3 , and the pH was adjusted to 6.0. After incubation for 2, 4, 8, 12, 24, and 48 h, the rates of pollen grains were calculated to determine under a light microscope when the pollen tube grows longer than the diameter of the pollen grain. Each sample was observed in 5 fields of view. At least 30 pollen grains were analyzed in each field. The experiment was repeated three times and three replicates (dishes) were carried out.

Preparation of Total Protein Extraction
Total soluble protein from mature pollen was isolated using the trichloroacetic acid and acetone (TCA-A) method with slight modifications [26]. Briefly, the pollen grains were ground in liquid nitrogen into fine powder and transferred to cold protein extraction buffer containing 10% (w/v) TCA and 0.07% (v/v) β-mercaptoethanol in acetone, incubated overnight at −20 • C, and then centrifuged at 14,000 g at 4 • C for 30 min. The precipitate was washed three times with the same cold protein extraction buffer without 10% (w/v) TCA, followed by incubation at −20 • C for 1 h and subsequent centrifugation at 14,000 g at 4 • C for 30 min for each wash. The resulting pellets were vacuum-dried, weighed and stored at −80 • C for further use. Each experiment was carried out by three biological replicates.

Two-Dimensional Gel Electrophoresis (2-DE)
The vacuum-dried protein samples were dissolved in a lysis buffer containing 7 M urea, 65 mM dithiothreitol (DTT), 4% (w/v) CHAPS, 2 M thiourea, and 0.2% carrier ampholytes for 1 h at room temperature with vortexing every 10 min, the homogenate was centrifuged at 15,000 g for 20 min. The protein concentration of the supernatant was determined by Bradford assay with bovine serum albumin as the standard [42]. 2-DE was performed following the protocol described by Sheoran et al. [1] and Zhang et al. [26]. Briefly, Protein samples (600 µg) were diluted in a rehydration buffer for 12 h. Isoelectric focusing (IEF) was performed using the Ettan III system (GE Healthcare, Chicago, IL, USA) and 18 cm Immobiline Dry Strips (pH 4-7, GE Healthcare). After IEF, the strips were treated in an equilibration buffer, placed on top of the vertical sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) gels and sealed with agarose and bromophenol blue. The gels were run with a running buffer in a PROTEAN II XL multi-cell (Bio-Rad, Hercules, CA, USA) electrophoresis tank under 10 mA constant for 30 min, and then 30 mA until the tracking dye reached the bottom of the gels. Three representative gels per sample were used for further analysis.

Gel Staining and Image Analysis
The gels were fixed in 50% (v/v) ethanol and 10% (v/v) acetic acid for overnight, washed with Milli-Q water three times for 10 min each, and stained with Colloidal Coomassie Blue G-250 (CCB) solution for 12 h [1]. After rinsed with water, the gels were digitized with a calibrated scanner (UMAX Powerlook 2100 XL; UMAX, Taiwan), annotated, analyzed for spot number and spot volume using Image Master 2D Platinum Software (Version 6.0; Amersham Biosciences, Uppsala, Sweden). Three replicate gels were run for each of three different pooled pollen samples, and protein spots observed consistently in replicate gels were selected for further analysis.

In-Gel Digestion and Mass Spectrometry
After 2-DE, the protein spots were manually cut from the gels and rinsed twice with Milli-Q water, destained with 100 mM Na 2 S 2 O 3 and 30 mM K 3 Fe(CN) 6 , dehydrated with 25 mM NH 4 HCO 3 and 50% (v/v) acetonitrile (ACN), reduced with 10 mM DTT, alkylated with 55 mM iodoacetamide, and then completely dried under vacuum. Protein digestion was performed with trypsin (Mass grade, Promega, Madison, WI, USA) using a MassPREP protein digest station (Micromass, Manchester, UK) and incubated overnight at 37 • C. The resulting tryptic digests were then analyzed by a MALDI-TOF/TOF tandem mass spectrometer ABI 4800 proteomics analyzer (Applied Biosystems, Framingham, MN, USA). To acquire the mass spectra, 0.4 µL samples were mixed with equal volumes of matrix solution containing 0.5 M α-cyano-4-hydroxycinnamic acid (CHCA), 50% (v/v) ACN and 0.05% (v/v) trifluoroacetic acid (TFA) and spotted onto a MALDI plate. Spectra were acquired in the 800-4000 m/z range, analyzed by 4000 Series Explorer Software v3.5 (AB SCIEX, Foster, CA, USA) in batch-processing mode of MS/MS. The intensity peaks were detected on minimum S/N ratio ≥10 and cluster area S/N threshold ≥40 without smoothing and raw spectrum filtering. Peptide precursor ions corresponding to contaminants including keratin and the trypsin autolytic products were excluded in a mass tolerance of 0.5 Da.

Protein Identification and Allergen Prediction
The peptide mass data were uploaded on the Protein Pilot software v3.0 (Applied Biosystems, Framingham, MN, USA) and MASCOT search engine [43], and subjected to blast against Populus trichocarpa genome database v3.0, NCBI non-redundant protein database and Swiss-Prot database. The following parameters were used for database searching: trypsin as the proteolytic enzyme, allowing for one missed cleavage; carbamidomethylation of cysteine as a fixed modification; oxidation of methionine as a variable modification. All of the positive proteins were identified with a Mowse score greater than 60 and 95% confidence interval. The identified proteins were categorized by function according to data from Blast2GO [44]. The output GO terms were then slimmed in REVIGO and treemaps were produced [45]. Allergen was predicted using the SDAP [7] on the base of sequence similarity (>35%) between presently obtained proteins and reported allergen proteins and the presence of consecutive amino acids (at least eight) in the analyzed protein sequences compared to known allergen proteins [7], and the Allergen Online [6].

Microarray Data Analyses
The microarray data for various tissues were available at NCBI Gene Expression Omnibus (GEO) database [46]. The series accession numbers GSE21481 (for P. trichocarpa) were used for the tissue-specific expression analysis. Probe sets corresponding to selected genes were identified using the online Probe Match tool POParray. For genes with one or more probe sets, the median of expression values was considered. The expression values were normalized by the Gene Chip Robust Multiarray Analysis (GCRMA) algorithm followed by log transformation and average calculation.

RNA Extraction and Quantitative Reverse Transcription Polymerase Chain Reaction (qRT-PCR) ANalysis
For RNA isolation and qRT-PCR, the leaves, stems, roots and pollen of P. tomentosa were harvested, immediately frozen in liquid nitrogen and stored at -80 • C for further analysis. Total RNA was isolated using the RNeasy Plant Mini Kit (Qiagen, Hilden, Germany) with on-column treatment using RNase-free DNase I (Qiagen) according to the manufacturer's instructions to ensure no genomic DNA contamination. First-strand cDNA was synthesized with approximately 1 µg of purified total RNA using the SuperScript III reverse transcription kit (Invitrogen, Carlsbad, CA, USA). qRT-PCR was performed in the LightCycler 480 Detection System (Roche, Penzberg, Germany) with two PtoActin genes as internal reference. The details of the primers used are listed in Table S2. The reaction mixture (20 µL) contained 10 µL 2× SYBR Green Real-time PCR Master Mix (TaKaRa, Dalian, China), 0.5 µM of each of the forward and reverse primers, and 2 µL of cDNA template. The amplification was completed with the following cycling parameters: 95 • C for 30 s; followed by 40 cycles at 95 • C for 5 s, 60 • C for 30 s; 60 • C for 60 s and 50 • C for 30 s. qRT-PCR was carried out in triplicates (technical repeats) to ensure the reproducibility of the results. The relative expression ratios were calculated from the threshold cycle according to the delta-delta CT method [47].

Conclusions
In summary, this study presents a comprehensive proteomic analysis and candidates for possible allergenic proteins in mature pollen of P. tomentosa. A total of 412 protein spots were isolated by 2-DE, and 159 different proteins were identified from 242 protein spots using MALDI-TOF/TOF MS/MS analysis. Furthermore, 27 proteins were identified as putative allergens, such as heat shock protein, enolase, pollen Ole e 1 allergen, thioredoxin and profilins, and their expression patterns across different tissues were analyzed based on an Affymetrix microarray data and qRT-PCR results. To our knowledge, this study is the first report on identification of possible allergenic proteins from P. tomentosa pollen. Further studies involving purification, recombinant protein expression, and epitope mapping of the identified putative allergens can be used as potential candidates for the development of hypoallergenic vaccines and innovative methods for immunotherapy and component-resolved diagnosis of P. tomentosa pollen allergy.