In Silico Comparative Exploration of Allergens of Periplaneta americana, Blattella germanica and Phoenix dactylifera for the Diagnosis of Patients Suffering from IgE-Mediated Allergic Respiratory Diseases

The burden of allergic illnesses is continuously rising, and patient diagnosis is a significant problem because of how intricately hereditary and environmental variables interact. The past three to four decades have seen an outbreak of allergies in high-income countries. According to reports on the illness, asthma affects around 300 million individuals worldwide. Identifying clinically important allergens for the accurate classification of IgE-mediated allergy respiratory disease diagnosis would be beneficial for implementing standardized allergen-associated therapy. Therefore, the current study includes an in silico analysis to identify potential IgE-mediated allergens in date palms and cockroaches. Such an immunoinformatic approach aids the prioritization of allergens with probable involvement in IgE-mediated allergic respiratory diseases. Immunoglobulin E (IgE) was used for molecular dynamic simulations, antigen–antibody docking analyses, epitope identifications, and characterizations. The potential of these allergens (Per a7, Per a 1.0102, and Bla g 1.0101) in IgE-mediated allergic respiratory diseases was explored through the evaluation of physicochemical characteristics, interaction observations, docking, and molecular dynamics simulations for drug and vaccine development.


Introduction
Asthma and allergic rhinitis are two of the most prevalent respiratory allergies worldwide, and their prevalence is steadily rising. Traditional ways of life and the environments in which people live have an impact on the prevalence of asthma [1][2][3]. According to respiratory allergy data from the Kingdom of Saudi Arabia, allergic rhinitis and bronchial asthma are present in 13.5% and 11.2% of the population, respectively [4]. German cockroaches are some of the most common indoor allergens in countries that traditionally feed on floors, such as Saudi Arabia and other Gulf countries [5][6][7][8]. As a result of improved illness characterization, particularly through the application of cutting-edge technologies,

Retrieval of the Protein Sequences of Date Palm and Cockroach Allergens
The sequences of allergen proteins were obtained from the UniProt online database (https://www.uniprot.org/help/uniprotkb, accessed on 24 May 2022), which is the major archive for different types of repositories used to carry out comprehensive genomic and proteomic analyses.

Physiochemical Parameter Evaluations of the above Allergens
The proteins identified as possessing physical and chemical characteristics were investigated for physiochemical properties, and the ProtParam server [21] was used to determine theoretical parameters, such as molecular weights, amino acids, pI (isoelectric point) values, instability indices, etc.

Functional Classifications
The primary identification mechanism for understanding pathogenesis in organisms is the distinction between virulent and non-virulent proteins. The VICMpred online prediction server [22] was used for the functional classification of bacteria using a bi-layer cascade SVM approach, which applies sequence information for the prediction of different virulence factors. The VICMpred webserver uses amino acid sequences in pattern-based approaches that show extremely important values of functional classification, i.e., median values >1.0.

Subcellular Localization
Protein localization is a significant aspect in the development of new drug targets for drug sightings, as cytoplasmic and membrane proteins have been recognized as pharmacological targets. Since no information regarding the subcellular localization of these protein sequences was available at the time, Plant-mSubP, a two-level support vector machine tool [23] for the prediction of subcellular localizations of single and multiple protein sequences, was utilized for Phoenix dactylifera. In the case of American cockroach (Periplaneta americana) and German cockroach (Blattella germanica) allergens, WoLF-PSORT [24] enabled subcellular prediction based on sorting signals, amino acid compositions, and functional motifs, such as DNA-binding motifs. More than one software package was utilized for accurate computational identifications of subcellular localizations.

Prediction of IgE Epitopes and Allergenic Site Prediction
The AlgPred server [25] was used to ensure the allergenicity potential of the protein sequences. It includes the integrated method of combining SVM amino acid composition or dipeptide-based methods, IgE epitope mapping, BLAST searching against allergen representative peptides (ARPs), and MAST (Motif Alignment and Search Tool)-MEME (Multiple EM for Motif Elicitation) suites to measure putative allergenicity for default parameters [26]. Further, AllerCatPro 2.0 [27] predicts the allergenic potential of protein sequences based on the structural similarities of their three-dimensional structures and their amino acid compositions when compared with protein allergens derived from public repositories. IgE sensitization towards proteins is frequently recognized upon exposure to aeroallergens, food allergens, and personal care products [28].

Secondary Structure Prediction
The SOPMA web server [29] was used to predict the 2D structures of target protein sequences. This online server enables simple and accurate predictions for the identification of different forms of a characteristic in secondary structures, such as alpha helices, beta turns, extended strands, and random coil regions, which contain the primary elements of 2D structure prediction.

Tertiary Structure Prediction (3D Model) and Validation
Homology modeling was performed using MODELLER to generate the 3D molecular structures of identified stable protein sequences of date palm and cockroach allergens that contain experimentally proven IgE epitopes [30,31]. MODELLER is a computational platform for comparative protein structure modeling which can be used to generate tertiary protein models [32].

Antigen-Antibody Docking Studies
Molecular docking analysis of all the prioritized antigenic sites and IgE was performed to estimate binding affinities. The human IgE three-dimensional structure was retrieved from RCSPDB (PDB ID: 4J4P and UniProt ID: P01854). The structure includes a complex of Human IgE-Fc with two bound Fab fragments. UCSF Chimera (https://www.cgl. ucsf.edu/chimera/, accessed on on 24 May 2022) enables manual preprocessing of these peptides, further homodimers were reduced to a single chain to reduce docking time, and non-amino acid molecules (ligands, ions, and solvent water) were removed to prevent hindrances during docking. Rigid-body molecular docking of the engineered vaccine with the processed receptors was performed based on shape-complementarity principles, utilizing PatchDock server [33]. This server differentiates a protein's surface into small patches (convex, concave, and flat) using a segmentation algorithm that is superposed using a shape-matching algorithm. The top conformations obtained with PatchDock were subjected to docking score refinement using FireDock [34]. FireDock refined the docked poses by optimizing side-chain conformations and rigid body orientations via Monte Carlo simulation.

MD Simulation
To analyze conformational stability, molecular dynamics and simulation studies were performed using GROMACS v5.1.5 and the OPLS-AA/L all-atom force field (2001 amino acid dihedrals). To study the interfacial atoms' physical movements and the complexes' stabilities in explicit water boxes (dodecahedrons), docked complexes were subjected to 309 K for 25 ns [35,36]. We used both NVT and NPT ensembles to mimic real experimental conditions. A six-step procedure was followed to analyze the trajectories of energy minimization in the MD simulations: (a) energy minimization of solvent molecules prior to the entire system using the Broyden-Fletcher-Goldfarb-Shanno (LBFGS) algorithm; (b) non-hydrogen solute atoms were restricted to 300 K temperature and 1 bar for 40 ns to attain equilibrium states; (c) control of temperature and pressures during initial simulations using Berendsen thermostats and the barostat algorithm; (d) initial trajectories were obtained to assist in the analysis of RMSDs; (e) RMSF hydrogen bond analysis was performed; and (f) determination of the radii of gyration for the antigen-antibody docked complexes [37,38].

Retrieval of the Protein Sequences for Date Palm and Cockroach Allergens
UniProt, a freely available database of protein sequences and their functional annotations from several genome sequencing projects, was utilized to mine the primary dataset protein sequences of the date palm and cockroach allergens. The primary dataset was then subjected to physiochemical characterization to evaluate pathogenic and allergenicity potential.

Physiochemical Parameter Evaluation of the above Allergens
ProtParam enabled the theoretical computation of instability indices, molecular weights, and GRAVYs of the identified allergens of date palms and cockroaches (Table 1). An instability index estimates the stability of the protein in a test tube. The sum of hydropathy values for all amino acids, when divided by the number of residues in a sequence, predicts the GRAVY (grand average of hydropathicity). The relative volume occupied by aliphatic side chains (alanine, valine, isoleucine, and leucine) in a protein sequence is termed an aliphatic index. The amino acid residue contents of the screened allergens ranged between 151 and 823. The instability index can be used to infer the stability of a measured protein in a test tube. Therefore, in this study, only stable amino acids with instability indices <40 were selected for further allergenicity and functional predictions.

Functional Characterization
VICMpred enables the functional characterization of iterated protein sequences into major functional modules ( Table 2). This machine-learning-based tool identified the active participation of primary protein sequences in metabolism molecules (32.35%), virulence factors (5.88%), cellular processes (44.12%), and information and storage (17.65%). Functional characterization is important in proteomics studies to understand biological functions at the system level.

Subcellular Localization
Plant-mSubP and WoLF-PSORT both assist in the selective distribution of primary protein sequences within cellular compartments. The sorting of protein sequences was mainly in the cell membranes, extracellular matrices, mitochondria, and cytoplasmic domains (Supplementary Table S1). The prioritized subcellular localizations after analysis were in the cytoplasm, mitochondria, plastids, and vacuoles.

Prediction of IgE Epitopes and Allergenic Site Prediction
Prediction of allergenic proteins and mapping of IgE epitopes by AlgPred identified allergenic peptides by utilizing five approaches (IgE epitope + ARPs BLAST + MAST + SVM) to attain an overall accuracy of 85%. The protein sequences were predicted to be nonallergenic if the score was estimated in the -ve sign ( Table 3). The results highlighted that the majority of inputted protein sequences did not contain an experimentally proven IgE epitope. A total of nine protein sequences were screened for allergenicity potential. Further, AllerCatPro 2.0 also validated the allergenicity potential of inputted protein sequences by estimating strong evidence (Table 4), which is a result of higher similarity search values against allergens present in public repositories.

Secondary Structure Prediction
In the absence of a three-dimensional structure and template sequence, secondary structure analysis is necessary to identify the percentages of alpha helices, extended strands, beta turns, and random coils in protein sequences (Table 5). Such analysis helps in the generation of the 3D structures of proteins. This analysis also renders information about protein activities, relationships, and functions. This study identified that alpha helices accounted for the majority coverage of the screened allergens, followed by random coils, beta turns, and extended strands. This implies that hydrogens bonds are mainly responsible for protein-protein interactions in further molecular docking procedures.

Tertiary Structure Prediction (3D model) and Validation
Three-dimensional structures of protein sequences (Bla g 1.0101, accession number: AF072219.2; Per a 7, accession number: ACS14052.1; and Per a 7.0102, accession number: AF106961.1) were mined from the UniProt database. MODELLER was then utilized to generate the three-dimensional structures of the protein sequences. These aided in the identification of template sequences with PDB codes 4JRB, 7KO4, and 6X5Z. Further, validation was performed using the SAVES server. We used a high-resolution structure refinement method, i.e., ModRefine (https://zhanglab.ccmb.med.umich.edu/ModRefiner/, accessed on 15 June 2022), which improves poor rotamers by simulating both protein backbones and side chains. The tertiary models generated were subjected to molecular docking analysis.

Antigen-Antibody Docking Studies
Protein-protein docking analysis was performed after protein backbone stabilization to determine the binding affinities of the resulting plausible antigenic protein sequences. Visualization of ions, ligands, and other non-amino acid molecules is possible by utilizing the Chimera Visualization tool (Figure 1). Rigid docking of antigenic sequences and antibodies was then carried out using PatchDock, and the poses so generated were sorted with respect to the binding energy functions. The top 10 docking outcomes were pulled and furthered for pose refinement using FireDock. The binding energies of the complexes and those with the lowest binding energies were screened (Blag 1.0101-IgE with −19.08 kcal/mol, Bla g 7-IgE with 11.2 kcal/mol, Per a 1.0101-IgE with −9.22 kcal/mol, Per a 1.0103-IgE with −7.12 kcal/mol, Per a 1.0104-IgE with −8.22 kcal/mol, Per a 1.0201-IgE with −7.76 kcal/mol, Per a 1.0102-IgE with −21.33 kcal/mol, and Per a 7-IgE with −19.71 kcal/mol). The visualization of protein-protein docking validated the major role of hydrogen bonding among protein sequences. The docked complexes with optimal minimum binding energies were considered for the MD simulation platform. In the case of Per a 1.0102-IgE, hydrogen bonds were formed among K90, Y82, N103, and Y105 of Per a 1.0102 with antigen recognition sites of IgE. A155, K160, and Q147 of Per a 7-IgE formed hydrogen bonds with variable regions of IgE. For Bla g 1.0101-IgE, the major interacting partners were Q229, L220, and K223 ( Figure 1). Such prioritization and inclusion of a molecular docking approach enabled the identification of potential allergens to be considered for vaccine and drug discovery. For the onset and persistence of most immediate-type allergies and several asthma phenotypes, immunoglobulin E (IgE) is essential. As a result, IgE is a key target for both diagnostic and therapeutic objectives [39]. There are two categories of IgE-binding epitopes: linear (sequential) and conformational (discontinuous). While conformational epitopes are generated by spatially nearby AAs that are far apart in the protein's AA primary sequence, linear epitopes are continuous AA sequences [40]. Bla g 1.0101 is secreted in the cockroach digestive tract, and sensitization occurs through inhalation of allergen-carrying faecal particles that are released into the environment [41]. Tropomyosins, for example, Per a 7, play a role in muscle contraction [42]. Tropomyosin is a pan-allergen found in the muscles of many animals [43,44]. Initially identified as a major shrimp allergen, it has since been found in a variety of insects and causes IgE cross-reactivity [45]. A study using RNA interference-mediated knockdown of this allergen in Periplaneta americana confirmed that Per a 1 is involved in digestion and nutrient absorption [43].
Periplaneta americana confirmed that Per a 1 is involved in digestion and nutrient absorption [43].

MD Simulation
MD simulations of the selected complexes for 25 ns using GROMACS v5.1.2 were performed and analyzed. The complexes were solvated in dodecahedron water boxes using a four-point TIP4P rigid water model with at least 1 nm of solvation on all sides, and neutralization was achieved by adding Na + ions. The particle mesh Ewald (PME) summation method was used for the treatment of long-range interactions with all bonds constrained using the LINCS algorithm. Further, the energy minimization of the system was carried out using a steepest descent method at a temperature of 310 K and one atmospheric bar pressure via a V-rescale thermostat and Parrinello-Rahman barostat implementation. The conformations were obtained at intervals of 10 ps throughout the 25 ns trajectory. Post-simulation, energy minimization and trajectory analysis showed that the complex initially showed 2 Å deviations but achieved stability later for the top three selected complexes (Figure 2). Residue-based root mean square fluctuation (RMSF) analysis of antigen-antibody docked complexes was performed to understand the flexibility of each residue, as depicted in Figure 2a. RMSF values for all docked complexes showed large fluctuations (0.33-1.67 nm) for the initial 20 residues in each case due to the unavailability of structural information for the target proteins (Figure 2d). Further, lesser

MD Simulation
MD simulations of the selected complexes for 25 ns using GROMACS v5.1.2 were performed and analyzed. The complexes were solvated in dodecahedron water boxes using a four-point TIP4P rigid water model with at least 1 nm of solvation on all sides, and neutralization was achieved by adding Na + ions. The particle mesh Ewald (PME) summation method was used for the treatment of long-range interactions with all bonds constrained using the LINCS algorithm. Further, the energy minimization of the system was carried out using a steepest descent method at a temperature of 310 K and one atmospheric bar pressure via a V-rescale thermostat and Parrinello-Rahman barostat implementation. The conformations were obtained at intervals of 10 ps throughout the 25 ns trajectory. Post-simulation, energy minimization and trajectory analysis showed that the complex initially showed 2 Å deviations but achieved stability later for the top three selected complexes (Figure 2). Residue-based root mean square fluctuation (RMSF) analysis of antigen-antibody docked complexes was performed to understand the flexibility of each residue, as depicted in Figure 2a. RMSF values for all docked complexes showed large fluctuations (0.33-1.67 nm) for the initial 20 residues in each case due to the unavailability of structural information for the target proteins (Figure 2d). Further, lesser fluctuations at the binding and active sites indicated the intactness and rigidity of the binding cavities. gmx_gyrate was used to calculate Rg values indicating the compactness and structural changes of the docked complexes. Rg is a measure of the mass of atoms with respect to the center of mass of complexes (Figure 2b). Average Rg values for Bla g 1.0101, Per a 1.0102, and Per a 7 ranged between 2.36 and 2.66 nm, 2.29 and 2.89 nm, and 2.44 and 2.56 nm, respectively, with no fluctuations after 25,000 ps. Further, Rg values correlated with RMSD values for backbone Cα atoms, validating the stability of the prioritized antigen-antibody complexes. These results indicate the suitability of the prioritized allergenic protein sequences among all the proteomes for further investigation in bench-top experiments.
fluctuations at the binding and active sites indicated the intactness and rigidity of the binding cavities. gmx_gyrate was used to calculate Rg values indicating the compactness and structural changes of the docked complexes. Rg is a measure of the mass of atoms with respect to the center of mass of complexes (Figure 2b). Average Rg values for Bla g 1.0101, Per a 1.0102, and Per a 7 ranged between 2.36 and 2.66 nm, 2.29 and 2.89 nm, and 2.44 and 2.56 nm, respectively, with no fluctuations after 25,000 ps. Further, Rg values correlated with RMSD values for backbone Cα atoms, validating the stability of the prioritized antigen-antibody complexes. These results indicate the suitability of the prioritized allergenic protein sequences among all the proteomes for further investigation in benchtop experiments.

Conclusions
Blattella germanica allergens appear to be about equally concentrated in homes in Saudi Arabia, even though the prevalence of the Bla g 2 allergen was found to be slightly greater in patients' homes. As there is little information available regarding cockroachrelated allergens, it is essential to investigate this issue in detail to empower ourselves with a remedy in advance. The current study intended to find immunodominant peptides that may be employed in the future to develop a universal peptide vaccine to treat cockroach-related illnesses. This will aid in eradicating future possibilities of asthma and allergenic rhinitis. In this study, an immunoinformatics approach was applied to evaluate the immunogenicity of prioritized proteins. Research incorporating experimental confirmation of these predicted epitopes is necessary to ensure the capabilities of B-cell and Tcell stimulations for their efficient use as vaccine candidates and as diagnostic agents against cockroaches.

Supplementary Materials:
The following supporting information can be downloaded at: www.mdpi.com/xxx/s1. Table S1. The sorting of protein sequences.

Conclusions
Blattella germanica allergens appear to be about equally concentrated in homes in Saudi Arabia, even though the prevalence of the Bla g 2 allergen was found to be slightly greater in patients' homes. As there is little information available regarding cockroachrelated allergens, it is essential to investigate this issue in detail to empower ourselves with a remedy in advance. The current study intended to find immunodominant peptides that may be employed in the future to develop a universal peptide vaccine to treat cockroach-related illnesses. This will aid in eradicating future possibilities of asthma and allergenic rhinitis. In this study, an immunoinformatics approach was applied to evaluate the immunogenicity of prioritized proteins. Research incorporating experimental confirmation of these predicted epitopes is necessary to ensure the capabilities of B-cell and T-cell stimulations for their efficient use as vaccine candidates and as diagnostic agents against cockroaches.

Data Availability Statement:
The authors confirm that the data supporting the findings of this study are available within the article, including the supplementary file. Raw data that support the findings of this study are available from the corresponding author upon request.