Search for New Allergens in Lolium perenne Pollen Growing under Different Air Pollution Conditions by Comparative Transcriptome Study

The relationship between air pollution and the allergenic capacity of pollen is widely accepted, with allergenicity being directly related to air pollution. To our knowledge, this is the first study comparing the differential expression of Lolium perenne pollen genes by RNAseq, in two wild populations with different levels of air pollution. The objective is to search for proteins that are expressed differentially in both situations and to establish a relationship with increased allergenic capacity. Two populations of L. perenne (Madrid and Ciudad Real) have been studied in two consecutive years, under the rationale that overexpressed genes in Madrid, with higher levels of NO2 and SO2, could be a cause for their greater allergenic capacity. Heat shock proteins (HSP), glycoside hydrolases, proteins with leucin-rich repeat motifs, and proteins with EF-HAND motifs were consistently overexpressed in Madrid pollen in the two years studied. Interestingly, some genes were overexpressed only in one of the years studied, such as pectinesterases in the first year, and lipid transfer proteins (LTPs) and thaumatin in the second. Despite the fact that the potential of all these proteins in relation to possible allergies has been reported, this is the first time they are cited as possible allergens of L. perenne. The results found can contribute decisively to the knowledge of the allergens of L. perenne and their relationship with atmospheric pollution, and to the development of much more effective vaccines.


Introduction
Most immunoglobulin E (IgE)-mediated allergies are caused by the plant's allergens, which may cause different symptoms such as rhinoconjunctivitis, edema, urticarial, asthma, and anaphylaxis [1]. Nowadays, the incidence of pollen allergy is undergoing a striking increase, with pollen allergens being the main cause among people with perennial allergic rhinitis [2,3]. Allergies and hypersensitive responses that are initiated by specific immunologic mechanisms triggered by pollen and other allergens, constitute one of the major health issues in modern societies [4].
The relationship between air pollution and the increased allergenic capacity of pollen is a widely accepted fact [5][6][7][8][9][10][11]. The relationship between the allergenic capacity of pollen, the degree of air pollution, and the physiological status of the plant has been recently demonstrated [12], revealing that plants growing under a higher atmospheric pollution had a lower photosynthetic efficiency, with altered

Results
To construct a transcriptome database, six mRNA libraries were generated in each collection moment (May 2017 and 2018) by Illumina sequencing, three from each population of L. perenne from Ciudad Real and three from Madrid. Table 1 summarizes the mapping results. Among the 52 million readings (on average with 101 bp read length) obtained in each sample, 53% of them were mapped. Uniformity between the samples, indicated that samples were comparable. Usually, it is possible to align 60-90% of the reads to the reference genome. However, this data depends upon the quality of the sample and the coverage of the reference genome. The highest percentages are obtained with very well-curated model organism genomes. A total of 666,180 genes were identified in 2017 after sequencing, mapping alignment, normalized expression, and differential expression, but 550,285 genes in 2017 were not available for the study, since the fold change and p adjust value were not obtained in the normalized expression analysis. A total of 115,895 genes were available for the genetic expression analysis, of which 107,599 had a p adjust value over 0.05. When the pollen of the two cities was compared, the expression pattern showed that in 2017, 366 genes were common to both cities (expression without significant differences), and 2443 genes were significantly overexpressed in the pollen from Madrid, while 5487 genes were significantly overexpressed in Ciudad Real ( Figure 1). A total of 666,181 genes were identified in 2018 after sequencing, mapping alignment, normalized expression, and differential expression, but 585,368 genes were not available for the study, since the fold change and p adjust value were not obtained in the normalized expression analysis. A total of 80,813 genes resulted as available for the genetic expression analysis, of which 69,239 had a p adjust value over 0.05. When the pollen of the two cities was compared, the expression pattern showed that in 2018, 163 genes were common to both cities (expression without significant differences), and 7568 genes were significantly overexpressed in the pollen from Madrid, while 3823 genes were significantly overexpressed in Ciudad Real ( Figure 2).  Table 2 shows the genes that have been identified with some description in any of the Gene Ontology (GO), KEGG Orthology (KO), Pfam, and Clusters of Orthologous Groups (COG) databases, among the 50 most overexpressed (based in the fold change value) in Madrid and Ciudad Real in both years.
In both years, the highest number of genes identified corresponded to genes overexpressed in the pollen from Madrid, eight out of 50 in 2017 and 25 out of 50 in 2018 compared to two and eight in Ciudad Real, respectively.
In Ciudad Real, overexpressed genes were related to the primary metabolism, none of which was involved in routes of secondary adaptive metabolism by which potential allergenic molecules were synthesized. Conversely, among the genes overexpressed in the Madrid pollen some were involved in secondary metabolism, especially in 2018. In 2017, genes involved in the shikimic acid pathway, in the phenylpropanoid pathway, and heat shock proteins of family 20 (HSP 20) were overexpressed. In 2018, almost all overexpressed genes were related to secondary metabolism. There were 14 isoforms of the heat shock protein of 20 family, two isoforms of allergen 1 from Betula verrucosa (Bet v 1), and two lipid transport proteins (LTPs). All of them have been described as allergens.
The GO analysis (Figures 3 and 4) shows the most abundant genes grouped in three categories: Cellular components, molecular function, and biological process. Common to both years, the most abundant genes corresponded to "cell parts", "organelle" and "cell" subcategories within "cellular components" category, "catalytic activity" and "binding" subcategories within "molecular function" category, and "metabolic process" and "cellular process" within "biological process" category and specific to 2018, "biological regulation" and "response to stimulus" within "biological process" category ( Figure 4).      Multi-organism process, 4. Cellular process, 5. Regulation of biological process, 6. Negative regulation of biological process, 7. Localization, 8. Response to stimulus, 9. Immune system process, 10 Among the upregulated genes in Madrid samples, ten showed isoforms in which overexpression was different depending on the year (Table 3); the number of isoforms from the six genes overexpressed in both years (HSP, glycoside hydrolase, Leucin rich repeat, EF hand family, pollen allergy, and coifilin), only in 2017 (Pectinesterase and serpin) and only in 2018 (lipid transfer protein and thaumatin) are indicated in the table. To make this table, the classification of functions proposed by the Pfam database (Supplementary Materials) has been taken into account. Only the genes overexpressed in Madrid have been taken into account since the greater allergenic capacity of this pollen with respect to that of Ciudad Real has already been shown, as well as its relationship with the atmospheric pollution. Therefore, the genes overexpressed in Madrid are candidates to be responsible for this greater allergenic capacity.

Discussion
Nowadays, allergic diseases have become a pandemic health problem. Among them, pollen allergies are considered the most important [21]. Some studies showed that most of the patients sensitized to pollen allergens have perennial allergic rhinitis [21,22]. Moreover, it is demonstrated that these diseases appear to be more prevalent in industrialized countries, and the incidence seems to be higher in polluted areas, especially areas with heavy traffic [4,12,23].
Some studies have shown the existence of an in situ allergic response in patients with negative skin prick test (SPT) results and undetectable IgE in the serum [24]. This clinical entity, known as local allergic rhinitis (LAR) [25], is considered a new phenotype of allergic rhinitis (AR) that must be differentiated from nonallergic rhinitis [26,27]. This misunderstanding could be related to several facts: (i) In most cases, pollen to which patients are exposed is not the same as that used in skin prick tests; (ii) the number of allergens involved in the allergic processes may be greater than what has been described so far; and (iii) probably some of the allergens are only expressed upon specific physiological conditions of plants and, among which is the degree of atmospheric pollution [12].
The International Union of Immunological Societies (WHO/IUIS) Allergen Nomenclature Sub-committee (http://www.allergen.org/search.php?allergensource=Lolium+perenne) establishes six allergens in Lolium perenne (Lol p 1, Lol p2, Lol p 3, Lol p 4, Lol p 5, and Lol p 11). The first three are expansins, proteins specialized in pollination, with its role being to weaken the cell wall during the development of the pollen tube.
The study of differential expression by RNAseq for two consecutive years in the L. perenne pollen, in two cities with different levels of atmospheric pollution intends to identify new allergens of L. perenne in order to improve the immunogenic therapy, making it more effective. At the same time, determining allergens related to higher air pollution levels could shed some light to explain why many patients with a negative skin test, still show local signs of allergy.
Genes encoded for L.perenne pollen allergen 1 (Lol p 1), were overexpressed in Madrid samples, but they were by no means the most overexpressed. The most overexpressed genes in the Madrid pollen in the two years studied were heat shock proteins (HSP), specifically with a molecular weight of 20, 70, and 90 ( Table 2). Most of them are from the chaperones family, and their mission is to refold damaged proteins after stressful situations [28]. They were first described in relation to heat stress situations, but they have been reported in many other stress situations. Some of these proteins have been described as allergy-causing agents in fungi, mites, chestnut (Cas s 9 is an HSP20), and hazelnut pollen (Cor a 10 is an HSP70) [29]. However, to date, the allergenicity of L. perenne pollen has not been related to HSP.
Other highly overexpressed genes in both years of study were those related to glycoside hydrolases ( Table 2) (EC 3.2.1.), a widespread group of enzymes that hydrolyze the glycosidic bond between two or more carbohydrates, with a group of 100 different families according to the sequence similarity [30][31][32]. Family 17 showed the highest levels of expression in both years. Glycoside hydrolase family 17 includes enzymes with several activities such as endo-1,3-beta-glucosidase (EC3.2.1.39), lichenase (EC 3.2.1.73), or exo-1,3-glucanase (EC 3.2.1.58). Currently, these enzymes have only been found in plants and in fungi. Some glucanases from plants have been described as allergens, i.e., in Hevea brasiliensis latex [32], olive pollen [33], and plant foods [34]. However, there are no bibliographic references to these enzymes as putative allergens in L. perenne.
Leucine rich repeats (LRR) genes have also been detected to be overexpressed in pollen from Madrid in the two years studied. LRR are repeated sequences present in a number of proteins with diverse functions and cellular locations. These repeated sequences are usually involved in protein-protein interactions. LRR domains are composed of beta-alpha units that form curved horseshoe structures with a parallel beta sheet on the concave side and mostly helical elements on the convex side. LRR domains are often flanked by cysteine rich domains [35,36]. Nowadays, an LRR-containing protein from wheat was found by screening a phage display wheat cDNA library with wheat allergic patients' IgE [37], but there is no bibliographic reference on L. perenne.
Genes of proteins with EF-HAND motifs were also overexpressed in the Madrid pollen in the two years studied. The EF-hand-containing proteins actively bind to Ca 2+ and chelate the cytosolic calcium to regulate calcium homeostasis [38]. The major EF-hand containing proteins are calcium dependent protein kinases (CDPKs/CPKs), calcineurin B-like (CBL), calmodulin-like proteins (CMLs), and calmodulins (CaMs). Allergens of this type have been described in the pollen of many different plants as Alnus glutinosa (Aln g 4), Brassica napus (Bra n polcalcin), Chenopodium album (Che a 3), Olea europaea (Ole e 3 and Ole e 8), etc. However, there is no reference to L. perenne.
We have found that some genes overexpressed in Madrid in the first or second year of study, but not in both. During the first year, genes that code for pectinesterases (seven isoforms) were overexpressed. These enzymes play an important role in the cell wall metabolism during fruit ripening [39]. Sal k 1 from the Salsola kali pollen was shown to be a major allergen [40].
During the second year of study, seven genes that code for isoenzymes of lipid transfer proteins (LTPs) and six isoenzymes of thaumatin were overexpressed in Madrid. Both are described as pathogenesis-related (PR) proteins and have a reputation for their allergenic capacity. LPTs are from family 14 (PR 14) and thaumatins belong to family 5 (PR 5). To date, the International Union of Immunological Societies Allergen Nomenclature Subcommittee has reported 39 allergenic LTPs from vegetables (n = 7), pollen of trees and weeds (n = 9), fruits (n = 18), nuts and seeds (n = 4), as well as latex (n = 1) [41]. Thaumatin-like proteins (TLPs) have been known for years as the main allergens in some fruits and pollen, such as allergen 3 from Cupressus arizonica (Cup a 3) [20]. There is no report relating the allergenicity of L. perenne pollen neither for LPTs nor for TLPs.

RNA Library Assembly
Before the RNA library assembly, ribosomal RNA was removed. This was performed with the Ribo-Zero rRNA kit removal kit. The TruSeq Stranded Total RNA library Prep kit was used to generate the libraries of RNA. First of all, 2 µg of total RNA (RIN > 9) libraries, were sequenced using a HiSeq2500 instrument (Illumina Inc, San Diego, CA, USA). Sequenced readings were paired-end with a length of 101 bp reading performed in six samples (three from Madrid and three from Ciudad Real). The estimated coverage was around 59 million reads per sample (one lane). Library generation and RNA sequencing was done at Sistemas Genómicos S.L. (Valencia, Spain) following the manufacturer's instructions.

RNA Transcriptomics Analysis
The FastQC v0.11.4 tool was used to check the quality control of the raw data. Then, the raw paired-end reads were mapped against the "Lolium perenne" ASM173568v1 genome provided by the NCBI database using the Tophat2 2.1.0 algorithm [42]. Insufficient quality reads (phred score < 5) were eliminated using Samtools 1.2 [43] and Picard Tools 2.12.1. Then, the GC distribution (i.e., the proportion of guanine and cytosine bp along the reads) was assessed, this should have a desired distribution between 40-60%. Moreover, to confirm that our sequencing contained a small proportion of duplicates, the distribution of duplicates (quality of sequencing indicator) were evaluated. Expression levels were calculated using the HTSeq [44]. This method employs unique reads for the estimation of gene expression and filters the multi-mapped reads. Differential expression analysis between conditions was assessed using DESeq2 [45]. Finally, we selected differentially expressed genes with a p-value adjusted by FDR < 0.05 and a fold change of at least 1.5 [46]. The DEG analysis between pollen from Madrid and pollen from Ciudad Real was done by using statistical packages designed by Python and R. using the DESeq2 algorithm [45]. By applying a differential negative binomial distribution for the statistics significance [44], we identified the genes that were differentially expressed. We considered as differently expressed genes those with a FC value below −1.5 or higher than 1.5 and with a p-value (Padj) corrected by FDR ≤ 0.05 to avoid the identification of false positives across the differential expression data.

Functional Enrichment Analysis
The gene category enrichment analysis was performed by comparing the differentially expressed genes to the Uniprot database by using Blastx and setting an e-value of 0.01 and a minimum of 40% of the protein length/transcript ratio. With the obtained terms, an over representation test was performed using an in-house R Script developed at Sistemas Genómicos (Valencia, Spain). The graphical plotting of DEGs distribution within GO categories was performed at http://wego.genomics.org.cn/.

Conclusions
In conclusion, the results obtained in this work can be very useful, since we have described genes that code for some overexpressed proteins in conditions of higher air pollution with potential allergenic capacity for the first time in L. perenne. These proteins have to be synthesized by heterologous cloning before their allergenicity can be checked with skin tests. In the case of positive responses, our contribution to the knowledge of allergens of L. perenne and their relationship with atmospheric pollution would be confirmed. On the other hand, this would contribute to the development of much more effective vaccines, probably solving the problem of an allergic response in patients with a negative skin prick test (SPT).