Semantic Analysis of Posttranslational Modification of Proteins Accumulated in Thyroid Cancer Cells Exposed to Simulated Microgravity

When monolayers of tissue cancer cells of various origins are exposed to real or simulated microgravity, many cells leave the monolayer and assemble to three-dimensional (3D) aggregates (spheroids). In order to define the cellular machinery leading to this change in growth behavior of FTC-133 human thyroid cancer cells and MCF-7 breast cancer cells, we recently performed proteome analyses on these cell lines and determined the proteins’ accumulation in monolayer cells grown under 1g-conditions as well as in the cells of spheroids assembled under simulated microgravity during three and 14 days, respectively. At that time, an influence of the increment or decrement of some of the more than 5000 proteins detected in each cell line was investigated. In this study, we focused on posttranslational modifications (PTMs) of proteins. For this purpose, we selected candidates from the list of the proteins detected in the two preceding proteome analyses, which showed significant accumulation in spheroid cells as compared to 1g monolayer cells. Then we searched for those PTMs of the selected proteins, which according to the literature have already been determined experimentally. Using the Semantic Protocol and RDF Query Language (SPARQL), various databases were examined. Most efficient was the search in the latest version of the dbPTM database. In total, we found 72 different classes of PTMs comprising mainly phosphorylation, glycosylation, ubiquitination and acetylation. Most interestingly, in 35 of the 69 proteins, N6 residues of lysine are modifiable.


Introduction
The occurrence of thyroid cancer is rising, especially among women. Worldwide, this tumor has become the seventh most common cancer among women [1,2]. If the progress of this cancer is driven by differentiated epithelial cells, it can frequently be cured [3]. Poorly differentiated or anaplastic types of thyroid cancer, however, have an unfavorable prognosis and current therapy options are limited [4,5]. Therefore, new strategies have been investigated in order to find novel targets for therapy of this kind of tumor [6] and to prevent metastasis [7].
For several years we have investigated the behavior of malignant thyroid cells, putting emphasis on cellular differentiation and on their migration [8]. In this context, we exposed human follicular thyroid cancer cells to simulated and real microgravity (µg) in vitro [9]. Under this condition, a part of the cells leaves the two-dimensional (2D) monolayers. Normally, thyroid cancer cells grow adherently in a 2D layer in vitro when they are seeded in culture flasks and incubated in an incubator on Earth at 1g normal laboratory conditions. The cells leaving monolayers assemble scaffold-free into three-dimensional (3D) aggregates (so-called spheroids) and continue to grow there [10]. This change of phenotype happens during spaceflights, as well as during cell cultivation on devices preventing sedimentation by annulling or randomization of the gravity vector [10][11][12][13][14]. It is accompanied by structural changes of the cytoskeleton as well as by alterations of the mRNA expression pattern and the protein production [15][16][17] and appears to mimic metastasis-like scaffold-supported cell migration [18].
Mass spectrometry is a powerful method to analyze the protein production of human cells [19]. It currently is applied in many fields including space research [20][21][22]. Recently, we performed deep proteome analyses on thyroid cancer cells unveiling about 5900 proteins, and on MCF-7 breast cancer cells for comparison, aiming to shed light on the molecular machinery, which enables the cells to move from one kind of growth (monolayer) to another one (spheroid) [17,23]. The studies revealed a number of proteins, whose accumulations were significantly different, when they were found either in monolayer cells of normal 1g cultures or in spheroids formed under simulated microgravity. Taken together they suggested that there is a relationship between the increment and decrement of distinct proteins and the kind of growth of the cells [17,23]. Furthermore, we observed phosphorylation of profilin-2 and de-phosphorylation of extracellular signal-regulated kinases (ERKs) 1/2 on thyroid cells exposed to microgravity [24,25] and recognized that genes mostly up-and down-regulated during a ten-day space mission coded for enzymes involved in posttranslational modifications (PTM) of target proteins [26]. The observations are in accordance with publications of other researchers [27,28]. Hence, it is probable that a protein's activation status determined by PTM, such as phosphorylation, glycosylation, ubiquitination and others [29,30], is also important for the cellular behavior under µg.
Consequently, we intend to gain comprehensive information about PTMs, which occur in thyroid cells forming a spheroid. We expect various PTMs to occur in a tremendous number of proteins [31,32], which appear to be involved in the cells leaving a monolayer and joining a 3D aggregate. Hence, we assume that a high number of experiments will be required until a reasonable overview will be obtained. In order to keep this number as low as possible, in this study we try to find PTMs that, according to the literature, have already been observed on proteins involved in spheroid formation.
PTMs already noticed by experiments are stored in various public repositories including the newly extended comprehensive dbPTM database [33]. Using Semantic Protocol and RDF Query Language (SPARQL) endpoints and a semantic data model allows the retrieval and handling of data stored in Resource Description Framework (RDF) format for further interlinking of various databases and enrichment of information through inference queries. In order to accomplish this, we applied the Sentient Knowledge Explorer (KE), semantic retrieval software that enables mapping, aligning and merging of information from several relevant resources [34,35], to establish the most comprehensive evidence possible. (Table 1) were selected from a list of 5989 human thyroid proteins identified and quantified in a recent proteome study on FTC-133 thyroid cancer cells, which grew in a monolayer under normal 1g laboratory conditions or within spheroids exposed to a random positioning machine (RPM) simulating microgravity during three days of incubation [17]. These proteins were of interest because they were found in spheroid cells but not in control monolayer cells, or were found in spheroid cells at a 1.8-fold higher accumulation than in control cells. In addition, they also were detected by a second independent proteome analysis on MCF-7 breast cancer cells, which grew in a monolayer under normal 1g laboratory conditions or within spheroids exposed to an RPM for 14 days [23]. Their up-regulation in spheroid cells as compared to control cells was similar to those of the corresponding FTC-133 cells ( Table 1). The selected 69 proteins comprised 1.2% of the total proteins detected in the FTC-133 spheroid cells. Since each of these proteins was detected in at least four samples being accumulated in spheroids of FTC-133 cells after three days and in MCF-7 spheroids after 14 days, they were considered to be candidates for triggering a transition from a 2D to a 3D kind of growth. In order to prove this hypothesis, they were further analyzed applying in silico methods.

Characterization of the Selected 69 Proteins According to Localization and Interaction
In a first approach, the cellular location of the selected proteins was studied using the Elsevier Pathway Studio v11 (Elsevier, Amsterdam, the Netherlands). Figure 1 shows that the majority of these proteins are located within the cytoplasm. But others are active within the nucleus, the mitochondria, the endoplasmic reticulum ER, the membrane and the extracellular matrix. At that point, it was of interest to see whether the selected proteins represent single independent hits or are members of a network. The Pathway Studio analysis revealed that 20 of these proteins form a network of interaction either on a gene level or on a protein level. Central components of the network were heme oxygenase 1 (HMOX1), thioredoxin (TXN) and NAD(P)H dehydrogenase [quinone] 1 (NQO1) (Figure 1). HMOX1 and NQO1 are both regulated by the transcription factor Nrf2 [36]. HMOX1 was of special interest because it had already shown up in an earlier study and was considered as important for spheroid formation [37]. In addition to the interactions shown in Figure 1, it has relationships to nuclear factor kappa B [25], to connective tissue growth factor, caveolin 1, and to intercellular adhesion molecule 1. All these proteins were found in earlier studies and were considered as key proteins of spheroid formation [24,38,39].

Creation of a Semantic Knowledge Base on PTM Modification
Regarding the 69 proteins shown in Table 1, which were selected due to their differences of accumulation in monolayer cells exposed to gravity as compared to spheroids cultured under microgravity, we were not only interested in cellular localization and interaction. We also focused on their posttranslational modifications, which generally have a great influence on cellular behavior [29,30]. Since a great number of PTMs could theoretically be considered for the selected 69 proteins, we retrieved only PTMs that, according to the literature, have already been verified for these proteins experimentally, and tried to set up a relationship to their function and their influence on the behavior of the cells. In order to find relevant literature and information about the proteins' PTMs, we applied the KE, which enables searching of various databases for many PTMs and harmonization of the results. Databases searched included UniProt, Entrez, Reactome and dbPTM ( Figure 2). These databases contain information supplementing each other, so that not only PTMs can be found but also their role within the concert of life.
Using KE, relevant information about each of the 69 proteins was collected from UniProt in a first step. For this purpose, a spreadsheet was created containing the names of the 69 proteins selected together with their gene names, UniProt accession numbers and experimental data, which indicated the individual cellular accumulation by LfQ scores (Table 1).

Creation of a Semantic Knowledge Base on PTM Modification
Regarding the 69 proteins shown in Table 1, which were selected due to their differences of accumulation in monolayer cells exposed to gravity as compared to spheroids cultured under microgravity, we were not only interested in cellular localization and interaction. We also focused on their posttranslational modifications, which generally have a great influence on cellular behavior [29,30]. Since a great number of PTMs could theoretically be considered for the selected 69 proteins, we retrieved only PTMs that, according to the literature, have already been verified for these proteins experimentally, and tried to set up a relationship to their function and their influence on the behavior of the cells. In order to find relevant literature and information about the proteins' PTMs, we applied the KE, which enables searching of various databases for many PTMs and harmonization of the results. Databases searched included UniProt, Entrez, Reactome and dbPTM ( Figure 2). These databases contain information supplementing each other, so that not only PTMs can be found but also their role within the concert of life.
Using KE, relevant information about each of the 69 proteins was collected from UniProt in a first step. For this purpose, a spreadsheet was created containing the names of the 69 proteins selected together with their gene names, UniProt accession numbers and experimental data, which indicated the individual cellular accumulation by LfQ scores (Table 1). To build the initial Semantic Knowledge Base (SKB), the spreadsheet was imported into KE and mapping of the experimental data based on their protein accession numbers to UniProt's SPARQL endpoint was applied [41]. A representative starting network for two selected proteins (Heme oxygenase 1 and NAD(P)H dehydrogenase [quinone] 1) from the 69 proteins of specific interest is shown in Figure 3.  [34,40]. Retrieved results were imported, harmonized and mapped to this ontology to create a Semantic Knowledge Base.
To build the initial Semantic Knowledge Base (SKB), the spreadsheet was imported into KE and mapping of the experimental data based on their protein accession numbers to UniProt's SPARQL endpoint was applied [41]. A representative starting network for two selected proteins (Heme oxygenase 1 and NAD(P)H dehydrogenase [quinone] 1) from the 69 proteins of specific interest is shown in Figure 3. In the next step, SPARQL queries were developed by selecting nodes from the canvas of the network graphs and applying filters via setting a certain element's variable or by applying restrictions, such as ranges on numerical values, as described earlier [34,35]. They were translated to text queries by KE. After suitable formulation they could be used to access a number of databases. First, a query was formulated suitable to search selected U.S. National Library of Medicine (NLM) databases (Biosystems, Protein, Gene, Online Mendelian Inheritance in Man (OMIM), Single Nucleotide Polymorphisms (SNP)). Then the National Center for Biotechnology Information (NCBI) Entrez Application Programming Interface (API) Connector was used to import results from the SPARQL queries directly as RDF into the knowledge graph [42,43]. The information obtained augments the characterization of a selected protein and will help to relate PTMs to their biological processes and the signaling pathways involved. This is shown for heme oxygenase in regard to functional, genetic and biological aspects of human thyroid cells (Figure 4). The information was mainly included in literature references indicated by PubMed unique identifier (PMID) numbers.
Subsequently, further SPARQL queries were used to expand the SKB by importing the protein's involvement in signaling pathways from Reactome [44], after cross reference with pathway information stored in the Kyoto Encyclopedia of Genes and Genomes (KEGG). In the lower part of Figure 4, the pathways found for heme oxygenase are indicated by black arrows to brown blocks. The numbers R-HAS-917937 and R-HAS-6785807 point to pathways of iron uptake and transportation, and of interleukin 4 signaling, respectively. In the next step, SPARQL queries were developed by selecting nodes from the canvas of the network graphs and applying filters via setting a certain element's variable or by applying restrictions, such as ranges on numerical values, as described earlier [34,35]. They were translated to text queries by KE. After suitable formulation they could be used to access a number of databases. First, a query was formulated suitable to search selected U.S. National Library of Medicine (NLM) databases (Biosystems, Protein, Gene, Online Mendelian Inheritance in Man (OMIM), Single Nucleotide Polymorphisms (SNP)). Then the National Center for Biotechnology Information (NCBI) Entrez Application Programming Interface (API) Connector was used to import results from the SPARQL queries directly as RDF into the knowledge graph [42,43]. The information obtained augments the characterization of a selected protein and will help to relate PTMs to their biological processes and the signaling pathways involved. This is shown for heme oxygenase in regard to functional, genetic and biological aspects of human thyroid cells (Figure 4). The information was mainly included in literature references indicated by PubMed unique identifier (PMID) numbers.
Subsequently, further SPARQL queries were used to expand the SKB by importing the protein's involvement in signaling pathways from Reactome [44], after cross reference with pathway information stored in the Kyoto Encyclopedia of Genes and Genomes (KEGG). In the lower part of Figure 4, the pathways found for heme oxygenase are indicated by black arrows to brown blocks. The numbers R-HAS-917937 and R-HAS-6785807 point to pathways of iron uptake and transportation, and of interleukin 4 signaling, respectively. In a final step, PTM information about the proteins selected was retrieved [29]. For this purpose, a SPARQL query was formulated to search the database dbPTM [33]. The query contains information obtained from UniProt and directs the program to search PTMs in the dbPTM database for all proteins imported into KE via spreadsheet ( Figure 5, Table 1). In the database, those fields were searched where information about the reference source, the type of a modification and the location with a short segment of the sequence modified can be found. In addition, the size of the area covered by the modification is available. In a final step, PTM information about the proteins selected was retrieved [29]. For this purpose, a SPARQL query was formulated to search the database dbPTM [33]. The query contains information obtained from UniProt and directs the program to search PTMs in the dbPTM database for all proteins imported into KE via spreadsheet ( Figure 5, Table 1). In the database, those fields were searched where information about the reference source, the type of a modification and the location with a short segment of the sequence modified can be found. In addition, the size of the area covered by the modification is available. In a final step, PTM information about the proteins selected was retrieved [29]. For this purpose, a SPARQL query was formulated to search the database dbPTM [33]. The query contains information obtained from UniProt and directs the program to search PTMs in the dbPTM database for all proteins imported into KE via spreadsheet ( Figure 5, Table 1). In the database, those fields were searched where information about the reference source, the type of a modification and the location with a short segment of the sequence modified can be found. In addition, the size of the area covered by the modification is available.

Figure 5.
A typical graphical SPARQL query for searching for information about all proteins imported in the KE via a spreadsheet in databases of interest for the topic of a study (see also ref. [35]). The query shown directs KE to the dbPTM to search for substrate sites and location of modifications, as well as for the surface area covered by the modification (a). The graphical query is auto-translated into a SPARQL text query by KE and used for searches (b).
Iterative use of queries with filters for the parameters of interest enabled us to accumulate comprehensive information about the PTMs of 69 proteins (see Supplementary Table S1). Figure 6 shows PTMs (yellow-green boxes) for 23 proteins (icons with accession numbers) selected from the total 69 proteins in order to keep a clear view on details. It can be seen that there are proteins with many PTMs and others with only a few or one. For example, cofilin-1, with the accession number P23528, shows arrows to a considerable number of PTMs, while the Ras-related protein, Rab-27B (O00194), shows only one arrow which links the icon to an N-acetylation of threonine. Taken together, it is obvious that the selected proteins are frequently modified by phosphorylation, ubiquitination and glycosylation. Figure 5. A typical graphical SPARQL query for searching for information about all proteins imported in the KE via a spreadsheet in databases of interest for the topic of a study (see also ref. [35]). The query shown directs KE to the dbPTM to search for substrate sites and location of modifications, as well as for the surface area covered by the modification (a). The graphical query is auto-translated into a SPARQL text query by KE and used for searches (b).
Iterative use of queries with filters for the parameters of interest enabled us to accumulate comprehensive information about the PTMs of 69 proteins (see Supplementary Table S1). Figure 6 shows PTMs (yellow-green boxes) for 23 proteins (icons with accession numbers) selected from the total 69 proteins in order to keep a clear view on details. It can be seen that there are proteins with many PTMs and others with only a few or one. For example, cofilin-1, with the accession number P23528, shows arrows to a considerable number of PTMs, while the Ras-related protein, Rab-27B (O00194), shows only one arrow which links the icon to an N-acetylation of threonine. Taken together, it is obvious that the selected proteins are frequently modified by phosphorylation, ubiquitination and glycosylation.

Analysis of the SKB on PTM Modification
The knowledge base created for 69 proteins is rather complex if it is represented as shown in Figure 6. Looking at such diagrams, recognizing detailed and useful information visually is difficult. Hence, the complexity was reduced by splitting the whole knowledge base into segments and visualizing and highlighting the results in individual graphs. For this purpose, graphical queries were applied to the entire network and their results highlighted. The aggregated query results are exported as spreadsheets to create final tables for all proteins involved in the study. Table 2 shows detailed data about five exemplary proteins.

Analysis of the SKB on PTM Modification
The knowledge base created for 69 proteins is rather complex if it is represented as shown in Figure 6. Looking at such diagrams, recognizing detailed and useful information visually is difficult. Hence, the complexity was reduced by splitting the whole knowledge base into segments and visualizing and highlighting the results in individual graphs. For this purpose, graphical queries were applied to the entire network and their results highlighted. The aggregated query results are exported as spreadsheets to create final tables for all proteins involved in the study. Table 2 shows detailed data about five exemplary proteins.  A summary of PTMs found for all of the 69 proteins is provided in the Supplementary Table S1, which shows the substrate sites already proved experimentally, but not sites theoretically modifiable, and indicates the accessible surface areas (ASA) covered by the modification. An evaluation of the table reveals 72 different classes of identified PTMs. In total, 406 classified PTMs have been counted as single count/protein for a specific class, even when multiple sites/protein have been affected. Of these, the most prominent classes were phosphorylation (48%; phosphoserine, phosphotyrosine, phosphothreonine) and lysine-N6 modification (18.9%), followed by modifications of other residual nitrogens (12.8%) or sulfurs (4%), and by glycin-lysin dipeptide coupling (3.4%). It should also be mentioned that 35 of the 69 proteins were modified on N6 groups of lysine. This observation is of great interest, because a tremendous down-regulation of the gene of the protein-lysine 6-oxidase (LOX) has been observed in FTC-133 cells during the Shenzhou-8/SimBox space mission [16,26]. The protein-lysine 6-oxidase catalyzes deamination of lysine residues [45].

Conclusions
Proteins change their influence on the life of a cell when differently accumulated within the cells and when modified after their translation. A comparative proteome analysis of thyroid cancer cells living within a monolayer under normal gravity or within spheroids under simulated µg, showed an up-regulation of 69 proteins detected in spheroids. Applying the KE, we searched in databases containing relevant literature and biological pathways. Most efficient was searching the dbPTM databases, which triplicated the result (Supplementary Table S1). Hence, we learnt details about the PTMs of each of the 69 proteins ( Table 2, Supplementary Table S1). Their evaluation unveiled a high percentage of the 69 selected proteins with modifiable N6 lysine residues. Hence, the study shows a way to facilitate planning work on possible PTMs of proteins of cells actually changing their type of growth and offers an explanation of earlier findings regarding the LOX gene. In future, the method may complement even advanced methods of proteome analysis with PTM identification facilities [46].

Proteome Data Used
The proteins were obtained by mass spectrometry from FTC-133 human follicular thyroid carcinoma cells and from MCF-7 human breast adenocarcinoma cells according to protocols described in refs. [17,23]. Prior to analysis both types of cells had been grown either within a monolayer under normal 1g laboratory conditions or within spheroids exposed to an RPM [10]. After harvest, the cells were lysed and subjected to mass spectrometry, obeying the protocols described in refs. [17,23,47,48]. Finally, raw data from the mass spectrometer were processed using MaxQuant (May Planck Society, Munich, Germany) computational proteomics platform (version 1.5.2.22) [49] using the standard parameters. Relative protein concentration was performed using the LfQ algorithm (label free quantitation) as described in [50].

Pathway Analysis
To investigate and visualize the original localization and the mutual interactions of detected proteins, we entered relevant UniProt accession numbers in a Pathway Studio v.11 software (Elsevier Research Solutions, Amsterdam, The Netherlands) [17,23].

Application of the Knowledge Explorer
To create a semantic network, harmonize content from multiple resources, and allow for graphical querying and reasoning, experimental data were imported to establish an initial RDF knowledge base using Sentient Knowledge Explorer (Melissa Informatics, Berkeley, CA, USA-former IO Informatics) [34]. The workflow of the process to generate a knowledge base by iterative selective SPARQL queries with reasoning to those resources and importing their results into the semantic network is depicted in Figure 2. UniProt content was queried using its SPARQL endpoint to augment information on enzymes and reported protein functions [41]. The Entrez resources Gene, OMIM, Protein, PubMed and SNP were used via Knowledge Explorer's NCBI Connector services to add content [42,43,51]. Parts of KEGG and Reactome were used to validate pathway information [44]. For classification of post translational modifications, the integrated resource for protein Post-Translational Modifications (dbPTM) was used [52]. The dbPTM is an integrated resource for protein post-translational modifications experimentally verified and annotating the potential PTMs for all UniProtKB protein entries [53]. dbPTM [33,53] is an aggregated protein-modification and protein-interaction database containing data from 7 sources (Uniprot, HPRB, PhosphoELM, Phosphositeplus, SysPTM [54], dbSNO [55], MeMo [56]), which categorize more than 80 classes of posttranslational modifications. It integrates experimentally verified PTMs from several databases and annotates the potential PTMs for all UniProtKB protein entries. Since the last update, dbPTM also provides disease association based on non-synonymous single nucleotide polymorphisms (nsSNPs). All PTMs experimentally confirmed for each protein were collected and denoted according to their PTM classification. The records were added to establish the final comprehensive SKB.