2.4. Enrichment Analysis in Gene Ontology for the Protein Targets of Penetratin
To identify the enrichment categories in Gene Ontology (GO) classes of biological process, molecular function, and cellular component, DAVID database version 6.8 was used [
44,
45]. The stringent
p-value cutoff of <0.05 was applied to select the significantly enriched terms in all the GO categories.
Under GO biological process category, DAVID provided a long list of 108 enriched terms with
p-value cutoff of <0.05. The top 20 enrichment terms of GO biological process for the protein targets of penetratin are listed in
Table 1. The entire enrichment items in GO biological process for the protein targets of penetratin are depicted in
Supplementary Table S2. Along with enriched terms in biological process,
Table 1 also displays GO ID for the term,
p-value, the number of protein targets of penetratin in the enriched category, and the total number of yeast proteins that belong to the enriched category. The enriched terms “cellular component biogenesis”, “cellular component organization or biogenesis” and “cellular component assembly” depicted the biological role of the identified protein targets of penetratin in the cellular component organization. Ribosome assembly and biogenesis were also highly enriched in GO biological process with the enriched terms: “ribosome biogenesis”, “ribosomal large subunit biogenesis”, “ribosomal large subunit assembly”, and “ribosome assembly”. Several terms for ribonucleoprotein complex biogenesis, like “ribonucleoprotein complex biogenesis”, “ribonucleoprotein complex subunit organization” and “ribonucleoprotein complex assembly” also depicted the role of the protein target of penetratin in ribonucleoprotein complex organization. Moreover, several GO terms were enriched for actin filament organization.
The GO enrichment result in biological process (BP) obtained from DAVID database by selecting “GOTERM_BP_ALL (i.e., all level)” includes all the possible terms in GO hierarchy (i.e., sibling terms) and inheritance related (child and parent terms), thus providing a long list of enriched items. These redundant lists of GO enrichments cause difficulty in the result interpretation. Hence, an online REVIGO platform was used to obtain a single representative GO term for the redundant enrichment terms (i.e., REVIGO remove the semantically similar GO terms) [
46]. REVIGO analysis with a cutoff of 0.5 identified 29 non-redundant enriched items from the 108 enriched terms GO biological process (displayed in
Table 2 with the degree of frequency, uniqueness, and dispensability). The GO items: “cellular component biogenesis”, “cellular component organization or biogenesis”, “cellular component organization”, “regulation of cellular component biogenesis” and “cellular component disassembly” are non-redundant terms; thus, they were not removed from the biological process that indicate assembly, arrangement, and disassembly of a cellular component. Several terms related to the ribosome in
Table 1 i.e., “ribosome biogenesis”, “ribosomal large subunit biogenesis”, “ribosomal large subunit assembly”, and “ribosome assembly” were removed and only “ribosomal large subunit biogenesis” was retained, which represents all the other ribosomal process. Moreover, GO terms related to ribonucleoprotein organization i.e., “ribonucleoprotein complex biogenesis”, “ribonucleoprotein complex subunit organization” and “ribonucleoprotein complex assembly” in
Table 1 were partly retained in
Table 2 with terms “ribonucleoprotein complex biogenesis” and “ribonucleoprotein complex subunit organization”. The biological process terms that include “ribosome large subunit biogenesis” is a function of “ribonucleoprotein complex biogenesis”, which is a function of “cellular component biogenesis” which is a function of “cellular component organization” which is further a function of “cellular process”. Hence, “cellular process” was depicted as a top group in PANTHER analysis of the biological process.
Protein targets of penetratin show significant enrichment in biological processes related to organization or biogenesis of cellular component, ribonucleoprotein complex, ribosomal large subunit, protein-containing complex subunit (also known as “macromolecular complex subunit”), organelle, chromosome, and actin cytoskeleton. The “metabolic process” depicted as the second top group of biological process in
Figure 3 is associated with the different metabolic processes that involved “nucleic acid metabolic process”, “heterocycle metabolic process”, “cellular aromatic compound metabolic process”, “organic cyclic compound metabolic process” and “cellular macromolecule metabolic process” (
Table 2).
The non-redundant enriched items identified by REVIGO for the enriched items in GO biological process are illustrated in
Figure 4A with a
p-value cutoff (the dotted line represents the
p-value = 0.05). These enrichment terms in the biological process depicted several terms related to cellular and metabolic processes. The identified 29 non-redundant items in the biological process (
Table 2) can still have some degree of similarity (as we used only a cutoff parameter of 0.5) and these terms can be cluster together to represent specific process. The 29 non-redundant biological process terms were clusters by REVIGO database and the results are displayed in
Figure 4B. The non-redundant enrichment items are illustrated by bubbles, and their size implies the frequency of the items identified in the Gene Ontology Annotation database (shown in the “Frequency” column in
Table 2). High frequency represented by larger bubbles is the more general term, whereas low frequency (smaller size of bubbles) indicates a more specific term. For example, the term “regulation of protein acetylation” with the lowest frequency of 0.1823 (
Table 2) represents a specific term and is shown by the smallest bubble in
Figure 4B. The color of the bubbles (light to dark green) indicates the corresponding
p-values of the enriched items, i.e., term with a
p-value close to 0.05, is shown in dark green. The bubble representing the non-redundant enrichment items is connected by a line based on the similarity of the two terms (
Figure 4B). The width of the line connecting the bubbles indicates the degree of similarity. The items targeting similar biological processes are clustered together. Two clusters are visualized for the 29 enriched terms in the biological process for the protein targets of penetratin (
Figure 4B). One cluster contains GO terms related to component biogenesis or organization and metabolic process whereas the other cluster consists of GO terms related to gene expression and biological process. The non-redundant term “response to stress” has no similarity to these two clusters.
The significant enriched items identified in the GO category of cellular component for the protein targets of penetratin are depicted in
Table 3. The stringent
p-value cutoff of <0.05 was applied for the selection of significantly enriched terms in the cellular component. The protein targets of penetratin that contribute to the significant enrichment term in the cellular component are depicted in column “Hit in this category”, and the proteins from entire
Saccharomyces cerevisiae that belong to this specific item are shown in column “Total gene in this category”. The results indicate the protein targets of penetratin localize inside the cytoplasm and is sub-localize in organelles like “non-membrane-bounded organelle”, “cortical cytoskeleton”, “nucleus”, “nucleolus”, “membrane-enclosed lumen”, “actin cytoskeleton” and others.
The enriched GO terms in the cellular component with their corresponding
p-value were analyzed using REVIGO to identify the non-redundant enrichment items. REVIGO identified 11 non-redundant enrichment items (
Table 4) from the list of 31 enrichment items (
Table 3). The protein targets of penetratin show cellular component enrichment in “cytoplasmic region”, “nucleus”, and “nucleolus” and sub-localization in “non-membrane-bounded organelle”, “membrane-enclosed lumen”, and others like “cortical cytoskeleton”, etc. (
Table 4).
Table 4 also displays frequency, uniqueness, and dispensability of each non-redundant enrichment item in the cellular component for the protein targets of penetratin.
Figure 5A displays the non-redundant enriched terms in GO category of cellular component for the protein targets of penetratin obtained by REVIGO. These non-redundant items were clustered based on the similarity, as depicted in interactive graphic visualization in
Figure 5B. The size of the bubbles corresponds to the frequency of each item in
Table 4, and the color of the bubbles from the light to dark green indicates the low to high
p-value. The width of the line represents the degree of similarity between the two enrichment items. Based on the similarity in co-localization of the protein inside the yeast cell, a cluster is observed for “cell cortex”, “cortical cytoskeleton”, “non-membrane-bounded organelle”, “nucleus”, “chromosomal region” and “cortical cytoskeleton”. Another cluster is observed for “preribosome” and “ribonucleoprotein complex”, which is obvious as these terms are related to protein binding to RNA. The three items “protein-containing complex”, “membrane-enclosed lumen”, and “cytoplasmic region” were independent in
Figure 5B, indicating these terms are not related to the two identified clusters.
DAVID database was used to obtain the list of GO enrichment terms in molecular function category for the protein targets of penetratin (
Table 5). To obtain only significant terms,
p-value cutoff of lower than (<) 0.05 was applied.
Table 5 shows the protein targets of penetratin are significantly enriched in “histone binding” and “protein binding” terms along with enrichment in “actin filament binding” and others. This demonstrates the identified protein targets of penetratin affect the proteins that function in protein and protein complex, nucleic acid (DNA and RNA) and histone, actin filament, and ubiquitin-related activities.
The list of enrichment GO terms in molecular function for the protein targets of penetratin (
Table 5) were analyzed by REVIGO database. The results as depicted in
Table 6 show non-redundant terms in the identified enriched terms for molecular function. For the enriched terms “ubiquitin protein ligase activity” and “ubiquitin-like protein ligase activity”, the term “ubiquitin protein ligase activity” is a more specific term than “ubiquitin-like protein ligase activity” in the hierarchical ancestor chart for GO:0061630 in EMBL-EBI database (
https://www.ebi.ac.uk/ (accessed on 5 November 2021)). Hence, “ubiquitin protein ligase activity” was retained, representing a non-redundant term in both the enriched terms. The enrichment results in
Table 6 are similar to
Figure 3D that illustrate the protein targets of penetratin are significantly enriched in “actin filament binding” (i.e., structural molecular activity in
Figure 3D), “ubiquitin protein ligase activity” (i.e., catalytic activity in
Figure 3D) and so on. The enriched item “macromolecular complex binding” (in
Table 5), is replaced with “protein-containing complex binding” (in
Table 6).
The non-redundant enrichment items in the GO category of molecular function with
p-value are depicted in
Figure 6A. Based on the degree of similarities between the enriched terms, these terms were clustered as depicted in
Figure 6B. The “protein binding” with the second lowest
p-value but the highest frequency value of 40.91 (
Table 6), is shown by the largest bubble. Whereas “histone binding”, the lowest
p-value with a frequency of 0.89 (
Table 6), is represented by a small size bubble. The enrichment terms “single-stranded DNA binding”, “translation factor activity, RNA binding” and “ubiquitin protein ligase activity” though non-redundant enrichment terms, have similarity in the function, and thus are connected in
Figure 6B. Moreover, the uniqueness value of 0.91 and 0.93 for “protein binding” and “protein-containing complex binding”, respectively, indicated unique terms, and no interaction was observed for these two terms with other enriched terms in molecular function.