Next Article in Journal
Vegetation Restoration Increases Soil Carbon Storage in Land Disturbed by a Photovoltaic Power Station in Semi-Arid Regions of Northern China
Next Article in Special Issue
Comparing Spatial Sampling Designs for Estimating Effectively Maize Crop Traits in Experimental Plots
Previous Article in Journal
The Long-Term Application of Controlled-Release Nitrogen Fertilizer Maintains a More Stable Bacterial Community and Nitrogen Cycling Functions Than Common Urea in Fluvo-Aquic Soil
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Exploring the Robustness of Causal Structures in Omics Data: A Sweet Cherry Proteogenomic Perspective

by
Maria Ganopoulou
1,
Aliki Xanthopoulou
2,
Michail Michailidis
3,
Lefteris Angelis
1,
Ioannis Ganopoulos
2 and
Theodoros Moysiadis
2,4,*
1
School of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
2
Institute of Plant Breeding and Genetic Resources, ELGO-DIMITRA, 57001 Thessaloniki, Greece
3
Laboratory of Pomology, Department of Horticulture, Aristotle University of Thessaloniki, 57001 Thessaloniki, Greece
4
Department of Computer Science, School of Sciences and Engineering, University of Nicosia, Nicosia 2417, Cyprus
*
Author to whom correspondence should be addressed.
Agronomy 2024, 14(1), 8; https://doi.org/10.3390/agronomy14010008
Submission received: 20 November 2023 / Revised: 15 December 2023 / Accepted: 18 December 2023 / Published: 19 December 2023
(This article belongs to the Special Issue Statistical Advances and Modeling in Agriculture)

Abstract

:
Causal discovery is a highly promising tool with a broad perspective in the field of biology. In this study, a causal structure robustness assessment algorithm is proposed and employed on the causal structures obtained, based on transcriptomic, proteomic, and the combined datasets, emerging from a quantitative proteogenomic atlas of 15 sweet cherry (Prunus avium L.) cv. ‘Tragana Edessis’ tissues. The algorithm assesses the impact of intervening in the datasets of the causal structures, using various criteria. The results showed that specific tissues exhibited an intense impact on the causal structures that were considered. In addition, the proteogenomic case demonstrated that biologically related tissues that referred to the same organ induced a similar impact on the causal structures considered, as was biologically expected. However, this result was subtler in both the transcriptomic and the proteomic cases. Furthermore, the causal structures based on a single omic analysis were found to be impacted to a larger extent, compared to the proteogenomic case, probably due to the distinctive biological features related to the proteome or the transcriptome. This study showcases the significance and perspective of assessing the causal structure robustness based on omic databases, in conjunction with causal discovery, and reveals advantages when employing a multiomics (proteogenomic) analysis compared to a single-omic (transcriptomic, proteomic) analysis.

1. Introduction

The elucidation of gene and protein expression, along with their interplays, stands as a cornerstone in biological sciences [1]. Transcriptome profiling on a large scale has gained traction as a primary technique to scrutinize the variety within biological specimens. Such analyses have traditionally been organ-specific or encompass entire entities, like plants, but there is a growing emphasis on detailed transcriptome profiles of distinct tissues or cells due to their potential to unravel gene functionality [2].
The field of proteogenomics emerged as an innovative approach, expanding the scope of genomic analysis through the integration of transcriptomic and proteomic data [3]. This methodological fusion seeks to concurrently examine alterations at the genetic level—such as mutations, polymorphisms, and insertions/deletions—with those at the protein level [4]. Proteogenomic databases are pivotal in correlating gene expression with protein production, thereby enhancing the comprehension of gene models [3,5,6]. Beyond its established utility in augmenting genome annotation and protein identification in non-model plant species [7], proteogenomics holds considerable promise in biotechnological endeavors, particularly plant breeding. For instance, proteogenomics facilitates a detailed elucidation of biosynthetic pathways, aiming to augment agronomic traits, a concept recently applied to pear breeding through gene co-expression module analysis [8]. Furthermore, proteogenomic analyses contribute to the exploration of alternative splicing events and post-translational modifications in plant species [9]. An application of proteogenomic analysis has also been noted in examining the role of carbamoyltransferase genes during the ripening of fleshy fruits [10]. Nonetheless, the development of bioinformatic tools to adeptly manage and interpret this wealth of proteogenomic data for actionable insights remains in its nascent stages [6].
Causal models are used to investigate the structure of causal relationships between different variables. Causal discovery justifies the causal nature of a relationship between two variables based on its persistence [11]. An advantage of causal model development, compared to traditional statistical association, is the existence of direction in the causal relationships between variables, which characterizes the cause and the effect variable in each relation [11]. This is of particular importance in several scientific fields, including biology, and contrasts with the typically used correlation indices, which are mostly bidirectional. Thus, causal discovery demonstrates a wide potential in the field of biology. Obtaining the causal structure may validate expectations and/or uncover new knowledge, facilitating scientific interpretations. Causal discovery has been used in the literature within different contexts (see, e.g., [12,13,14,15,16,17]). Particularly, in the field of genetics, causal methods or their underlying ideas have been applied, among else, to detect causal relationships among phenotypes [18,19] to infer gene regulatory networks [20,21,22,23,24,25], and to infer causal associations between gene expression and disease [26]. A more detailed discussion on causal discovery in biology can be found in Glymour et al. [27].
Causal structure investigation and Directed Acyclic Graphs (DAGs), in particular, have been employed by our group in several studies that involved multiomics data. Employing causal structure learning in olive leaves and roots at a proteogenomic level, unveiled key interaction networks involved in salt priming in olive trees [28]. A causal model-based multiomics pipeline was introduced in Boutsika, et al. [29] to determine the molecular portrait of the PGI potatoes of the Naxos Island. Genome-wide DNA methylation, RNA sequencing and quantitative proteomics were exploited, revealing key environment-derived molecular factors, putative epimarkers and key microbes, relevant to authenticating Naxos potato [29]. In addition, causal discovery was employed in sweet cherry multiomics data, leading to understanding the cause–effect relationships that are important in the fruit softening and ripening process in sweet cherry (Prunus avium L.) [30]. The analysis in Ganopoulou et al. [30] was based on a plant tissue proteogenomic atlas that contains a combination of sweet cherry (Prunus avium L.) tree transcriptomic and proteomic datasets (represented by 29,247 genes and 7584 proteins, respectively), involving 15 sweet cherry tissue samples [31]. The sweet cherry, a perennial fruit tree belonging to the Rosaceae family, holds a prominent economic position globally [32,33,34]. Its non-climacteric ripening pattern distinguishes it from other Prunus species like peach and apricot, adding to the significance of its study [35].
A question arising when determining the causal structure is related to the robustness of the causal structure itself. Towards this direction, a causal structure robustness assessment approach has been recently introduced, aiming to assess the robustness of the causal relationships of genetic risk factors that affect the Syntax Score, an index that evaluates the complexity of coronary artery disease [36]. This approach investigated the impact on the obtained causal structures, both local and global, under different levels of interventions, reflected in the increasing number of patients (observations) that were randomly excluded from the datasets considered.
The aim herein was to propose a new causal structure robustness assessment algorithm, specifically designed for single-omic or multiomics data involving plant tissue samples, and employ sweet cherry as an example to apply it and assess the robustness of the related obtained causal structures. In contrast to Ganopoulou et al. [30], where causal discovery referred to the gene/protein consensus modules (clusters) that were obtained by employing weighted gene co-expression network analysis (WGCNA) [37], herein, it referred directly to the gene/proteins. The transcriptomic and proteomic data were separately considered at first, and the results were compared with the case when they were jointly used in a multiomics analysis. The proposed approach assesses the robustness of the obtained causal structures by evaluating the impact of intervening in the datasets of these causal structures using diverse criteria. The differences compared to the approach proposed in Ganopoulou et al. [36] are that, herein, each of the observations (plant tissues) involved is removed, one at a time, from the datasets (compared to the random exclusion of an increasing number of observations/patients in [36]), and the causal structure is re-determined and compared not only to the initial causal structure but, in addition, to the remaining re-determined causal structures. The reason for separately treating each tissue is that plant tissue samples are, in general, straightforwardly related to specific biological functions. Assessing the robustness of causal structures that are obtained, either in a single-omic or in a multiomics context, on top of validating causal discovery and related inferred knowledge and conclusions, may provide valuable insight regarding specific tissues and their impact when determining causal relationships.

2. Materials and Methods

2.1. Directed Acyclic Graphs (DAGs)

Bayesian networks constitute a distinct category of graphical models used for illustrating and explicating the causal relationships among random variables. These networks are constructed as DAGs that were initially proposed by Pearl [38]. DAGs employ nodes (or vertices) and directed edges (or arcs) as fundamental components to visualize the causal structure. Each node typically corresponds to a random variable. The graphical representation enables the visualization of statistical dependencies existing among various variables. Within the framework of DAGs, paths (or chains) are delineated as sequences of interconnected edges. If a directed edge connects variable X to Y , X is identified as the parent (or cause) of Y , while Y is considered the child (or effect) of X . DAGs exhibit an acyclic structure, namely no paths of edges originating from a node and terminating at the same node exist.
A completed partially-directed acyclic graph (CPDAG) represents the Markov equivalence class of a DAG [39]. All DAGs that belong to a specific equivalence class describe the same conditional independence relationships since they are structured with the same skeleton (adjacencies) and the same v-structures. Assume D is a DAG. The skeleton of D is the undirected graph that is formed by removing the directions of all the edges in the DAG. A v-structure in D is an ordered triplet of nodes ( x ,   y ,   z ), such that D contains the directions of x y and y←z; additionally, the nodes x ,   z are not connected with an edge in D . Some edges may exhibit an undetermined direction (so-called bidirected/undirected edges). This means that they have the opposite direction from one DAG in the equivalence class to another DAG in the same equivalence class.

2.2. Data Description

Sweet cherry cv. ‘Tragana Edessis’ tissue samples (15 in total) were collected (represented by 29,247 genes and 7584 proteins) related to the most important organs. In particular, the tissues covered the annual sweet cherry shoot (“1st Year shoot”), early growth leaves (“Young leaves”), fully developed leaves (“Leaves”), dormant flower (“Flower buds”), dormant vegetative buds (“Dormancy buds”, early in spring, ecodormancy stage), flowers at both white tip stage (“1st Bloom”) and full flowering phase (“Flower”). Sweet cherry fruit (exo-mesocarp) were sampled during four developmental stages, corresponding to the fruit set (8 days after full bloom [DAFB]; “Fruit 1st stage”), the beginning of fruit coloring from green to yellow (20 DAFB; “Fruit 2nd stage”), the coloring from yellow to red (34 DAFB; “Fruit 3rd stage”), and to the fruit ripe for harvest (44 DAFB; “Fruit 4th stage”). The corresponding stems were collected at the same developmental stages (“Stem 1st stage”, “Stem 2nd stage”, “Stem 3rd stage” and “Stem 4th stage”). More details can be found in Xanthopoulou et al. [31]. The transcript expression and protein abundances are available in the SweetBiOmics database (www.GrCherrydb.com, accessed on 1 September 2023).

2.3. Robustness Assessment Algorithm

The causal structure robustness assessment algorithm was tailored for single-omic or multiomics data involving plant tissue samples. It is focused on assessing the robustness of the related obtained causal structures by evaluating the impact of intervening in the datasets based on which these causal structures were determined. It entails the following steps:
i.
Determine/estimate the initial causal structure (CPDAG) based on the related omic database (assume that it entails n plant tissues), using a selected causal structure learning algorithm.
ii.
Each of the plant tissues (observations) involved is removed, one at a time, from the database and the causal structure is re-determined, resulting in n new causal structures (CPDAGs), each corresponding to a particular plant tissue excluded.
iii.
Each of the CPDAGs is compared to all CPDAGs ( n + 1 , both the initial and the re-determined CPDAGs), each of which is assumed to be the reference causal structure in each comparison.
iv.
The comparison is performed based on various metrics. In particular:
a.
The structural Hamming distance (SHD) between two CPDAGs [40]. This distance accounts for the number of operators required to make two CPDAGs match, or more specifically, to add or delete an undirected edge, and add, remove, or reverse the orientation of an edge. The SHD is computed for all pairs of CPDAGs.
b.
The percentage of common bidirected edges. The percentage of bidirected edges in each reference CPDAG that remained bidirected in each CPDAG.
c.
The percentage of common directed edges. The percentage of directed edges in each reference CPDAG that remained directed with the same direction in each CPDAG.
d.
The percentage of directed to bidirected edges. The percentage of directed edges in each reference CPDAG that turned into bidirected in each CPDAG.
e.
The percentage of directed edges that changed direction. The percentage of directed edges in each reference CPDAG that changed direction in each CPDAG.
v.
Hierarchical clustering is performed based on the above metrics and/or combinations of the metrics and is used to cluster the n + 1 CPDAGs based on their comparison to all CPDAGs (when considered as the reference causal structures). In particular, the n + 1 vectors, which correspond to the values of a selected metric when each of the n + 1 CPDAGs is compared to all CPDAGs (reference causal structures), are used to hierarchically cluster the CPDAGs.

Reasoning

The reasoning within Steps (ii) and (iii) is that, since plant tissue samples are, in general, related to specific biological functions, by removing the data related to a specific plant tissue from the database and re-determining the causal structure, valuable insight may emerge pertained to this tissue. For example, based on the differences between the re-determined causal structure compared to the initial causal structure, conclusions can be drawn regarding the impact of a specific tissue on the estimated initial causal relationships. In addition, CPDAGs corresponding to the exclusion of biologically similar tissues (e.g., covering the same organ) may be expected to be similar. If not, it may be of interest to understand the differences in the corresponding causal structures. Moreover, two re-determined CPDAGs may exhibit many differences compared to the initial CPDAG, and at the same time be very similar to each other, or exhibit many differences as well when being compared to each other. The selection of criteria in step (iv) aimed to efficiently describe the CPDAGs comparison, based both on an overall assessment (SHD) and specific characteristics of the CPDAGs, such as similarities in bidirected/directed edges and changes that have occurred from one CPDAG to another. The reasoning within Step (v) is to evaluate the obtained CPDAG clusters and draw specific conclusions. An interesting aspect is to investigate whether CPDAGs corresponding to the exclusion of biologically similar tissues are clustered together.

2.4. Sweet Cherry Proteogenomic Atlas–Causal Structure Robustness Assessment

The analysis was based on the protein abundances and the transcript FPKMs in the Sweet Cherry Proteogenomic Atlas [31]. Initially, the pre-processing of the data was performed as described in Xanthopoulou et al. [31]. Additionally, only gene/protein pairs with valid values for all tissues at both proteomic and transcriptomic levels were selected (n = 7244). Of these, only the gene/protein pairs with values greater than 1 in at least 5 tissues (one out of three) at both protein and transcriptomic levels were further assessed, resulting in 6332 cases. Both the proteomic and the transcriptomic data were standardized per protein/gene ID across all 15 tissues.
Then, the causal structure robustness assessment algorithm, described in Section 2.3, was separately performed in three cases, (a) single-omic: only the transcriptomic data were considered (n = 6332, 15 tissues), (b) single-omic: only the proteomic data were considered (n = 6332, 15 tissues), and (c) multiomics: both the transcriptomic and the proteomic data were considered (n = 6332, 30 tissues).
At step (i) of the algorithm, the causal relationships among (a) the 6332 genes, (b) the 6332 proteins, and (c) the 6332 gene/protein pairs were initially determined based on all 15 tissues with the constrained-based PC algorithm [41,42], which is a common algorithm used to learn the structure of a causal Bayesian network, named after its inventors, Peter Spirtes and Clark Glymour. The CPDAG that was obtained was represented by “All15T” (in all three cases (a), (b) and (c)). It was typically assumed that causal sufficiency holds [11]. This condition implies that for each pair of measured variables, all their common direct causes are measured as well. That is to say, there are no hidden, unmeasured confounders for any pair of variables. The PC algorithm was applied using the R package “MXM” [43]. The skeleton of the causal network was developed with the function “pc.con”, which performs a faster implementation of the PC algorithm compared to the “pc.skel” function (in the same R package), but is limited to continuous data only as was the case in this study. The method argument was opted to be “pearson”, and the significance level “alpha” was set to 0.01, both being the default values in this function.
Then (step (ii)), each of the available 15 tissues was removed from each dataset, one at a time, and the causal structure was re-determined resulting in 15 newly determined CPDAGs in each case, based on 14, 14 and 28 tissues considered, respectively, ((a), (b) and (c)), since in the case of the proteogenomic analysis each tissue was removed from both datasets. These 15 re-determined CPDAGs were represented by “T i ”, where i = 1, 2, …, 15, corresponded to the ith tissue that was removed from the analysis. Namely, “T i ” stands for the CPDAG that was based on all 15 tissues except for the ith tissue. For example, if the 5th tissue was removed from a dataset, then the re-determined CPDAG was represented by “T5”. The order of the 15 tissues was “1st Bloom”, “1st Year shoot”, “Fruit 1st stage”, “Fruit 2nd stage”, “Fruit 3rd stage”, “Fruit 4th stage”, “Stem 1st stage”, “Stem 2nd stage”, “Stem 3rd stage”, “Stem 4th stage”, “Dormancy buds”, “Flower”, “Flower buds”, “Leaves”, and “Young leaves”. Thus, the corresponding CPDAGs when each of these 15 tissues was removed from each dataset were “T1”, “T2”, “T3”, …, “T14” and “T15”. The names of the CPDAGs were the same in all three analyses, i.e., “T3” represented the CPDAG not including the third tissue in the case of the transcriptomic, the proteomic analysis, and the proteogenomic analysis.
Next, each of the 16 CPDAGs (“All15T”, “T1”, “T2”, …, “T15”) was compared to all 16 CPDAGs in each case (step (iii)). In step (iv), the metrics employed were, the SHD, the percentage of the directed edges in each of the 16 causal structures (when considered as reference) that remained directed with the same direction in each of the 16 CPDAGs, and the percentage of the directed edges in each of the 16 reference CPDAGs that either remained directed with the same direction, or were transformed into a bidirected edge in each of the 16 CPDAGs. The above metrics were employed in hierarchical clustering in step (v). The hierarchical clustering was performed and visualized with the “pheatmap” function in R using the Euclidean as clustering distance and the complete clustering method. The absolute numbers of all the metrics in step (iv) of the algorithm ((iv) a–e, Section 2.3) were computed as well.
All the analyses were performed with the R programming language, Version 4.2.1.

3. Results

In the transcriptomic case, the results of the comparison based on the SHD and the percentage of the directed edges in each of the reference causal structures that remained directed with the same direction in each of the remaining causal structures are displayed in Figure 1. It is shown in Figure 1A that in the case that the 11th tissue (“Dormancy buds”) was removed from the transcriptomic database, the impact was the highest of all cases, since the CPDAG T11 exhibited the highest SHD compared to ALL15T (2256) among all CPDAGs, and, in addition, T11 exhibited the highest SHD to each of the reference CPDAGs, compared to all other CPDAGs. The smallest SHD compared to ALL15T was observed in the case of CPDAG T5 (882). Other than that, it was observed that CPDAGs corresponding to the removal of tissues with similar biological functions, such as tissues 3–6 (“Fruit 1st stage”, “Fruit 2nd stage”, “Fruit 3rd stage”, and “Fruit 4th stage”) were not clustered together in the heatmap. Similarly, the CPDAGs T7–T10 (“Stem 1st stage”, “Stem 2nd stage”, “ Stem 3rd stage”, and “ Stem 4th stage”), the CPDAGs T1–T13 (“Dormancy buds”, “Flower”, “Flower buds”), and T14–T15 (“Leaves”, “Young leaves”) were not clustered together as well (Figure 1A).
In the case of the assessment of the percentage of directed edges in each of the reference causal structures that remained directed with the same direction in each of the remaining 15 CPDAGs (Figure 1B), it was similarly observed that CPDAGs corresponding to the removal of tissues with similar biological function did not cluster together with the exception of CPDAGs T4–T6. In addition, CPDAG T3 (corresponding to the exclusion of “Fruit 1st stage”) was the one that exhibited the lowest percentage in almost all cases (except when compared to T4). The highest percentage (similarity in terms of directed edges) compared to ALL15T was observed in the case of CPDAG T8 (58.82).
Finally, when assessing the percentage of directed edges in each of the reference CPDAGs that either remained directed with the same direction or turned into bidirected, in each of the remaining causal structures (Figure A1), it was observed that the CPDAGs corresponding to excluded tissues with similar biological function did not, in general, cluster together with the exception of CPDAGs T4–T6, as was the case also in Figure 1B, and T8–T9. No CPDAG exhibited very low or very high percentages compared to the remaining CPDAGs. Still, T3 was the one that exhibited the lowest percentage compared to the CPDAG All15T (23.53%), while the highest percentage, respectively, was observed again in the case of T8 (70.59%).
The numerical details regarding the metrics that were considered, in particular, the SHD between each pair of CPDAGs, the number of bidirected edges in each reference CPDAG that remained bidirected in each of the remaining causal structures, and the number of directed edges in each reference CPDAG that remained directed with the same direction, or turned into bidirected, or changed direction in each of the remaining causal structures are displayed in Table A1. It was observed that the number of directed edges in each reference CPDAG that changed direction was in almost all cases zero, with a few exceptions of ones. Particularly, in comparison to the CPDAG All15T, only in the case of CPDAGs T3 and T6 one change in direction was observed, while in all other cases, the direction changes were zero.
In Figure 2A, it is shown that in the proteomic case, when the first tissue (“1st Bloom”) was removed from the database, the impact was the highest of all cases since CPDAG T1 exhibited by far the highest SHD compared to ALL15T (2672) among all CPDAGs. T1 exhibited the highest SHD to each of the CPDAGs considered as reference causal structures, compared to all other CPDAGs as well. The smallest SHD compared to ALL15T was observed in the case of CPDAG T9 (1312). On the other hand, the CPDAGs corresponding to the exclusion of tissues 3–5, related to fruit stages, were clustered together in the heatmap (Figure 2A). Similarly, the CPDAGs T7–T10, corresponding to the four stem stages, were also clustered together. Moreover, CPDAGs T1–T13 (corresponding to the tissues “Dormancy buds”, “Flower”, “Flower buds”), and T14–T15 (“Leaves”, “Young leaves”) were clustered very close to each other, respectively, as well (Figure 2A).
This was not the case, however, when the percentage of directed edges in each of the reference causal structures that remained directed with the same direction in each of the remaining causal structures was assessed (Figure 2B). In this case, it was observed that CPDAGs corresponding to the exclusion of tissues with similar biological functions did not cluster together. In addition, there was no CPDAG that demonstrated systematically high or low values of percentages of common directed edges to each of the reference CPDAGs, compared to all other CPDAGs. Still, the lowest percentage of commonly directed edges to the All15T CPDAG was observed in the case of T1 with 5.88% (Figure 2B), while the highest percentage was in the case of CPDAG T3 and T14 (35.29).
When the percentage of directed edges in each of the reference causal structures that either remained directed with the same direction or turned into bidirected in each of the remaining causal structures was assessed (Figure A2), the results were very similar to in the previous case (Figure 2B). More specifically, the CPDAGs corresponding to excluded tissues with similar biological functions did not cluster together and the lowest percentage of directed edges in the CPDAG All15T that either were retained or turned into bidirected was observed in the case of T1 with 29.41% (Figure A2). The highest percentage involved CPDAG T9 (61.76%) followed by CPDAG T3 (58.82%).
Numerical details regarding the metrics considered are displayed in Table A2. Similarly, as in the case of the transcriptomic analysis, it was observed that the number of directed edges in each reference CPDAG that changed direction was in almost all cases zero. Compared to the CPDAG All15T, only in the case of CPDAGs T2 and T8, there was observed one change in direction, while in all other cases, the direction changes were zero.
In the proteogenomic case, it is shown in Figure 3A that when the 1st tissue (“1st Bloom”) was removed from the combined database, the impact was the highest of all cases and CPDAG T1 exhibited by far the highest SHD among all CPDAGs, compared to ALL15T (3270). T1 exhibited the highest SHD to each of the CPDAGs when treated as a reference, compared to all other CPDAGs as well. The smallest SHD compared to the CPDAG ALL15T was observed in the case of CPDAG T9 (1654). These results are similar to the respective results in the proteomic analysis. Moreover, the CPDAGs corresponding to the exclusion of tissues 3–4, which are related to the fruiting stage, were clustered together in the heatmap (Figure 3A). Similarly, the CPDAGs T7–T10 that correspond to the four stem stages were also clustered close to each other. In addition, CPDAGs T11–T12 (corresponding to the tissues “Dormancy buds” and “Flower”), and T14–T15 (“Leaves”, “Young leaves”) were clustered together, respectively, as well (Figure 3A).
These results were observed as well when assessing the percentage of directed edges in each of the reference causal structures that retained their direction in each of the remaining causal structures (Figure 3B). In this case, the results were even more emphatical, since the CPDAGs T4–T6 (correspond to fruit stages), the CPDAGs T7–T10 (four stem stages), the CPDAGs T11–T13 (buds and flowers) and the CPDAGs T14–T15 (leaves) were also clustered together, respectively. (Figure 3B). On top of that, similarly to the assessment of SHD (Figure 3A), the CPDAG that exhibited the least similarity to all other reference CPDAGs was T1. Particularly, when compared to All15T, it exhibited a percentage of 25.19%, which was by far lower than any other CPDAG (the highest percentage was 52.89% and was observed in the cases of both T7 and T9).
Next, the percentage of directed edges in each of the reference causal structures that either remained directed with the same direction or turned into bidirected, in each of the remaining causal structures was assessed (Figure A3). The results were very similar to the previous case (Figure 3B) and the case when the SHD was assessed (Figure 3A). On top of the fact that CPDAGs corresponding to the exclusion of biologically similar tissues were clustered together, CPDAG T1 again exhibited the least similarity to all other reference CPDAGs. The lowest by far percentage of directed edges in the CPDAG All15T that either were retained or turned into bidirected was 46.74% (T1), while the highest was 70.80% followed by 70.62% (again involving T9 and T7, respectively).
Numerical details regarding the metrics considered are displayed in Table A3. In this case, it was observed that the number of directed edges in each reference CPDAG that changed direction received large numbers in almost all cases. When considering the CPDAG All15T as a reference, CPDAG T1 exhibited the highest number of direction changes (86), which additionally was the highest number of direction changes overall (Table A3).

4. Discussion

This study proposes a causal structure robustness assessment algorithm for single-omic or multiomics data involving plant tissues and demonstrates its application on a sweet cherry proteogenomic atlas. The robustness assessment of the causal structures based on the transcriptomic, proteomic, and combined proteogenomic datasets underscored that different tissues were revealed, in each case, to exhibit the highest impact when excluded from the corresponding databases. Notably, by excluding the tissues “Dormancy buds” and “Fruit 1st stage”, the causal structure based on the transcriptomic data was pronouncedly more influenced compared to the exclusion of the remaining tissues (based on the SHD, Figure 1A and the percentage of directed edges retained, Figure 1B, respectively). This could be attributed to the fact that while the transcriptional regulation in dormant buds is limited, during the transition from endodormancy to ecodormancy an explosion of transcription activity has been observed that clearly divided the ecodormancy from other dormancy stages in sweet cherries [44,45,46]. Similarly, the early stage of fruit development is characterized by elevated transcriptional activity due to continuous cell division that progressively decreases in the following stages [47,48]. Hence, the endorsement of transcriptional activity was observed.
On the other hand, in the case of both the proteomic and the proteogenomic analysis, it was shown that when the first tissue (“1st Bloom”) was excluded from the data, the impact inflicted on the causal structure was the highest compared to all other tissues (Figure 2A and Figure 3A,B). The high impact of the exclusion of the “1st Bloom” tissue was even more evident in the case of the proteogenomic analysis. This may be attributed to the fact that “1st Bloom” is related to protein abundance that clearly separates it from the remaining tissues. This was observed as well and discussed in a previous publication of our group [31], and also noticed in Arabidopsis thaliana where exclusive proteins were found in abundance at pollen, callus, seed and flower of the plant [49].
Moreover, it was expected that the CPDAGs corresponding to the exclusion of biologically similar tissues would be clustered together or close to each other. In particular, the CPDAGs corresponding to the tissues 3–5 related to the fruiting stage, the CPDAGs T7–T10, corresponding to the four stem stages, the CPDAGs T1–T13 corresponding to the tissues “Dormancy buds”, “Flower” and “Flower buds”, and the CPDAGs T14–T15 corresponding to the tissues “Leaves” and “Young leaves”. This expectation was based on the fact that the only thing that theoretically changes is the developmental stage and not the histological one. When assessing the clustering of the CPDAGs, based on their SHD to all other CPDAGs, and the percentage of common directed edges (or directed and directed that turned into bidirected) with all other CPDAGs, it was found that in the transcriptomic case, this expectation was not satisfied (Figure 1 and Figure A1). Indeed, in a transcriptomic analysis of quality changes during sweet cherry fruit development, the transcriptome of the organs has been found to exhibit strong differentiation depending on the developmental stage [50,51].
This expectation was satisfied, however, in the case of both the proteomic and the proteogenomic analysis, at least when considering the SHD as the metric to assess the causal structure differences (Figure 2A and Figure 3A). In the case that the hierarchical clustering was based on the percentage of the directed edges in each of the reference causal structures that remained directed with the same direction in each of the remaining CPDAGs, and on the percentage of the directed edges in each of the reference causal structures that either remained directed with the same direction, or turned into bidirected in each of the CPDAGs, it was found that this expectation was fulfilled only in the case of the proteogenomic analysis (Figure 3A,B). This is probably because the combined analysis of proteome and transcriptome managed to separate tissues belonging to other organs, by reducing the noise from analyzing the proteome or transcriptome alone.
The fact that the expectation of CPDAGs corresponding to the removal of tissues with similar biological function to be clustered together was satisfied, the best in the case of the proteogenomic analysis additionally implies that collecting and analyzing fewer tissue samples from an organ, when this is relevant or necessary, may be facilitated in case a combined proteogenomic analysis is considered. Namely, in the case of a proteogenomic analysis, opting to use fewer representative tissues of a specific organ is expected to result in a milder impact on the corresponding causal structure, compared to the case of the single omic analysis of the proteome or the transcriptome.
By an overall comparison of the percentages of the directed edges in each of the reference causal structures that remained directed with the same direction in each of the remaining 15 CPDAGs (Figure 1B, Figure 2B and Figure 3B), it can be clearly observed that they are much higher in the case of the proteogenomic analysis compared to the other two cases. Since the SHD as a metric cannot be employed to technically compare the results, because it is largely influenced by the complexity of the CPDAGs considered, which in the case of the proteogenomic analysis is much more pronounced (see Table A1, Table A2 and Table A3 in Appendix A for more details), the above result is indicative of obtaining more robust causal structures in the case of jointly employing the proteome and transcriptome databases. These results reinforce the importance of a multi-omics approach to capture the full breadth of biological processes [52,53,54].
Lastly, the biological implications of analyzing and assessing the robustness of the obtained causal structures concern the reliability of the biological interpretations that may emerge from these causal structures. Namely, a robustness analysis may further support the biological conclusions emerging from causal discovery and at the same time highlight hidden aspects in the data.

5. Conclusions

By employing the proposed algorithm to assess omic sweet cherry causal structures, it is showcased that specific tissues exhibited a strong impact when removed from the analysis. In the proteogenomic case, a similar impact was induced on the causal structures when biologically related tissues were excluded. This result was less pronounced in the proteomic analysis and especially in the transcriptomic analysis. This may be attributed to the distinctive biological features related to proteome and transcriptome. Moreover, causal structures based on single omic analyses were impacted to a larger extent, compared to proteogenomic analysis. Thus, this study reveals the importance of assessing the causal structure robustness, along with causal discovery, within the framework of omics data, and showcases the added advantages and perspective. Furthermore, while not questioning the importance of separately performing a transcriptomic or proteomic analysis, it provides valuable insight into the benefits of employing a proteogenomic analysis, further supporting the usage of multiomics analysis in research.

Author Contributions

Conceptualization, M.G. and T.M.; Data curation, M.G. and A.X.; Formal analysis, M.G. and T.M.; Investigation, M.G., M.M. and I.G.; Methodology, M.G. and T.M.; Project administration, T.M.; Resources, A.X.; Software, M.G.; Supervision, I.G. and T.M.; Validation, A.X., M.M. and I.G.; Visualization, M.G. and T.M.; Writing, original draft, M.G. and T.M.; Writing, review and editing, A.X., M.M., L.A., I.G. and T.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The transcript expression and protein abundances are available in the SweetBiOmics database (www.GrCherrydb.com, accessed on 1 September 2023).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Figure A1. Transcriptomic analysis. The comparison of each of the 16 CPDAGs (rows) to all 16 reference CPDAGs (columns) is displayed. Hierarchical clustering was performed by row. The percentages of the directed edges in each of the reference causal structures that either remained directed with the same direction or turned into bidirected in each of the 16 CPDAGs.
Figure A1. Transcriptomic analysis. The comparison of each of the 16 CPDAGs (rows) to all 16 reference CPDAGs (columns) is displayed. Hierarchical clustering was performed by row. The percentages of the directed edges in each of the reference causal structures that either remained directed with the same direction or turned into bidirected in each of the 16 CPDAGs.
Agronomy 14 00008 g0a1
Figure A2. Proteomic analysis. The comparison of each of the 16 CPDAGs (rows) to all 16 reference CPDAGs (columns) is displayed. Hierarchical clustering was performed by row. The percentages of the directed edges in each of the reference causal structures that either remained directed with the same direction or turned into bidirected in each of the 16 CPDAGs.
Figure A2. Proteomic analysis. The comparison of each of the 16 CPDAGs (rows) to all 16 reference CPDAGs (columns) is displayed. Hierarchical clustering was performed by row. The percentages of the directed edges in each of the reference causal structures that either remained directed with the same direction or turned into bidirected in each of the 16 CPDAGs.
Agronomy 14 00008 g0a2
Figure A3. Proteogenomic analysis. The comparison of each of the 16 CPDAGs (rows) to all 16 reference CPDAGs (columns) is displayed. Hierarchical clustering was performed by row. The percentages of the directed edges in each of the reference causal structures that either remained directed with the same direction or turned into bidirected in each of the 16 CPDAGs.
Figure A3. Proteogenomic analysis. The comparison of each of the 16 CPDAGs (rows) to all 16 reference CPDAGs (columns) is displayed. Hierarchical clustering was performed by row. The percentages of the directed edges in each of the reference causal structures that either remained directed with the same direction or turned into bidirected in each of the 16 CPDAGs.
Agronomy 14 00008 g0a3
Table A1. Transcriptomic analysis. The comparison of each of the 16 CPDAGs (rows) to all 16 reference CPDAGs (columns) is numerically displayed. The comparison is based on the structural Hamming distance (SHD) between each pair of CPDAGs, the number of bidirected edges in each reference CPDAG (columns) that remained bidirected (BiD) in each of the remaining CPDAGs (rows), the number of directed edges in each reference CPDAG (columns) that remained directed with the same direction (Dir) in each of the remaining CPDAGs (rows), the number of directed edges in each reference CPDAG (columns) that turned into bidirected (DtoB) in each of the remaining CPDAGs (rows), and the number of directed edges in each reference CPDAG (columns) that changed direction (DCh) in each of the remaining CPDAGs (rows).
Table A1. Transcriptomic analysis. The comparison of each of the 16 CPDAGs (rows) to all 16 reference CPDAGs (columns) is numerically displayed. The comparison is based on the structural Hamming distance (SHD) between each pair of CPDAGs, the number of bidirected edges in each reference CPDAG (columns) that remained bidirected (BiD) in each of the remaining CPDAGs (rows), the number of directed edges in each reference CPDAG (columns) that remained directed with the same direction (Dir) in each of the remaining CPDAGs (rows), the number of directed edges in each reference CPDAG (columns) that turned into bidirected (DtoB) in each of the remaining CPDAGs (rows), and the number of directed edges in each reference CPDAG (columns) that changed direction (DCh) in each of the remaining CPDAGs (rows).
CPDAGsMetricsAll15TT1T2T3T4T5T6T7T8T9T10T11T12T13T14T15
All15TSHD01822181817411268882119113341064900164422561540147817721932
BiD1125669677690803907804780841890700553732761661625
Dir3481041010148201810610181012
DtoB010778410876129156106
DCh0001001000000000
T1SHD1822025022604226820742255231221662074250129002352235225402570
BiD6691117506470552608541536565595487390531540470464
Dir8346224241010846866
DtoB90258353754851096
DCh0000001000100000
T2SHD1818250202590223820422259224221182022245828822436234025502592
BiD6775061131482567623546561584615502402515551474465
Dir10630224966106610866
DtoB890774441051047766
DCh0000001000000000
T3SHD1741260425900218620422154219121262022250728102420233425762582
BiD6904704821113571614564563573607483410511546458457
Dir42222422244224424
DtoB4920424696686666
DCh1000000100100000
T4SHD1268226822382186014381730185216961606210226002106207022502368
BiD8035525675711110760664646676707579459586607534510
Dir102242881261210426666
DtoB8108504829812101010126
DCh0000000000000000
T5SHD882207420422042143801457166014901316189825041832185620682190
BiD9076086236147601127740700737787638490662668589562
Dir1044281884108628464
DtoB910779010887119101065
DCh0000001000000000
T6SHD1191225522592154173014570183416591481205126222005203122732429
BiD8045415465646647401078634669721576439593600513480
Dir1429212830810124610864
DtoB710464204869611985
DCh1110010011101111
T7SHD1334231222422191185216601834017681642213226522139209021942398
BiD7805365615636467006341102653693569443570599545497
Dir846264818810668686
DtoB10104595701065411665
DCh0001000000001000
T8SHD1064216621182126169614901659176801318189024821942192820142242
BiD8415655845736767376696531088766621480613631582530
Dir2010641210108301610410141212
DtoB476562106056611783
DCh0000001000000000
T9SHD900207420222022160613161481164213180183024121822180420162182
BiD8905956156077077877216937661102643503652669589551
Dir1810104108121016288610141010
DtoB48655584701079776
DCh0000001000000000
T10SHD1644250124582507210218982051213218901830027142335236823562622
BiD7004875024835796385765696216431091425520525502436
Dir1086246461083648888
DtoB76658311596067796
DCh0101001000001000
T11SHD2256290028822810260025042622265224822412271402782268427502827
BiD5533904024104594904394434805034251102414452407391
Dir64622266464246444
DtoB7653876365607583
DCh0000000000000001
T12SHD1540235224362420210618322005213919421822233527820223023362490
BiD7325315155115866625935706136525204141104566512477
Dir10610468108101086366106
DtoB5734505474840785
DCh0000001100100000
T13SHD1478235223402334207018562031209019281804236826842230024162338
BiD7615405515466076686005996316695254525661130506529
Dir1888464861414846361012
DtoB58737684438710063
DCh0000001000000000
T14SHD1772254025502576225020682273219420142016235627502336241602424
BiD6614704744585345895135455825895024075125061080481
Dir10662666812108410102810
DtoB83567610467476602
DCh0000001000000000
T15SHD1932257025922582236821902429239822422182262228272490233824240
BiD6254644654575105624804975305514363914775294811083
Dir1266464461210846121026
DtoB4545445333878240
DCh0000001000010000
Table A2. Proteomic analysis. The comparison of each of the 16 CPDAGs (rows) to all 16 reference CPDAGs (columns) is numerically displayed. The comparison is based on the structural Hamming distance (SHD) between each pair of CPDAGs, the number of bidirected edges in each reference CPDAG (columns) that remained bidirected (BiD) in each of the remaining CPDAGs (rows), the number of directed edges in each reference CPDAG (columns) that remained directed with the same direction (Dir) in each of the remaining CPDAGs (rows), the number of directed edges in each reference CPDAG (columns) that turned into bidirected (DtoB) in each of the remaining CPDAGs (rows), and the number of directed edges in each reference CPDAG (columns) that changed direction (DCh) in each of the remaining CPDAGs (rows).
Table A2. Proteomic analysis. The comparison of each of the 16 CPDAGs (rows) to all 16 reference CPDAGs (columns) is numerically displayed. The comparison is based on the structural Hamming distance (SHD) between each pair of CPDAGs, the number of bidirected edges in each reference CPDAG (columns) that remained bidirected (BiD) in each of the remaining CPDAGs (rows), the number of directed edges in each reference CPDAG (columns) that remained directed with the same direction (Dir) in each of the remaining CPDAGs (rows), the number of directed edges in each reference CPDAG (columns) that turned into bidirected (DtoB) in each of the remaining CPDAGs (rows), and the number of directed edges in each reference CPDAG (columns) that changed direction (DCh) in each of the remaining CPDAGs (rows).
CPDAGsMetricsAll15TT1T2T3T4T5T6T7T8T9T10T11T12T13T14T15
All15TSHD0267215571546148414501794139013451312131416561616183219141938
BiD1049399669645680679599679707717712632646585572563
Dir34281288910810101059126
DtoB0121115111916132015111211231010
DCh0010000010000000
T1SHD2672030342938298229503095288229022928286429822944305030883046
BiD3991066317318323324291326340336342319331300298303
Dir24806440240624444
DtoB809846117875761353
DCh0000001000000000
T2SHD1557303402186223622022366209021182090204223532259233924322514
BiD6693171062498500502466514524535537467495471453427
Dir80324245444422322
DtoB88071010109149109101488
DCh1000000000011100
T3SHD1546293821860218621002370204220942048200622802240240825112496
BiD6453184981007489502441502505520524463474429409408
Dir1264501012310121210297108
DtoB83100591361185961563
DCh0000000000000010
T4SHD1484298222362186020772363209020562100202222442214240424562446
BiD6803235004891040524459506530525535488499444440437
Dir8421042828661023442
DtoB66970101081566971775
DCh0000011000000000
T5SHD1450295022022100207702244206220402050205622282222240224342428
BiD6793245025025241027479504526531521481488437436434
Dir8441284448106625584
DtoB56865013912441181765
DCh0000100000000000
T6SHD1794309523662370236322440232823122300231424582450255726222616
BiD5992914664414594791033443466473459427437406393393
Dir90532452811240033
DtoB58788110712771281684
DCh0100100000000100
T7SHD1390288220902042209020622328019411826185621872112231223302338
BiD6793265145025065044431001538569555478501448447443
Dir1024108883646641342
DtoB971110712130141061191598
DCh0000000010010000
T8SHD1345290221182094205620402312194101860187221942139226122732396
BiD7073405245055305264665381033578566495510476481443
Dir8441261014468627868
DtoB739881114100889101455
DCh1000000100001110
T9SHD1312292820902048210020502300182618600190221602170226023122329
BiD7173365355205255314735695781039564505509480474463
Dir1004126616842434566
DtoB117108912151214071441486
DCh0000000000000001
T10SHD1314286420422006202220562314185618721902021502084226723222316
BiD7123425375245355214595555665641024500521469463460
Dir10641010626643625644
DtoB651186111711161201251387
DCh0000000000000100
T11SHD1656298223532280224422282458218721942160215002260238825082392
BiD6323194674634884814274784955055001032484447423448
Dir102222244232463626
DtoB7711107151591388071395
DCh0010000100000000
T12SHD1616294422592240221422222450211221392170208422600240024782466
BiD6463314954744994884375015105095214841043444433432
Dir54293501745338566
DtoB12881181114131111109018107
DCh0010000010000000
T13SHD1832305023392408240424022557231222612260226723882400025802550
BiD5853004714294444374064484764804694474441027404405
Dir94374503856656057
DtoB5565799688858052
DCh0010001010100000
T14SHD1914308824322511245624342622233022732312232225082478258002567
BiD5722984534094404363934474814744634234334041044408
Dir1242104834664265406
DtoB7510661015111067851306
DCh0001000010000001
T15SHD1938304625142496244624282616233823962329231623922466255025670
BiD5633034274084374343934434434634604484324054081036
Dir64282432864667632
DtoB76999101061195441540
DCh0000000001000010
Table A3. Proteogenomic analysis. The comparison of each of the 16 CPDAGs (rows) to all 16 reference CPDAGs (columns) is numerically displayed. The comparison is based on the structural Hamming distance (SHD) between each pair of CPDAGs, the number of bidirected edges in each reference CPDAG (columns) that remained bidirected (BiD) in each of the remaining CPDAGs (rows), the number of directed edges in each reference CPDAG (columns) that remained directed with the same direction (Dir) in each of the remaining CPDAGs (rows), the number of directed edges in each reference CPDAG (columns) that turned into bidirected (DtoB) in each of the remaining CPDAGs (rows), and the number of directed edges in each reference CPDAG (columns) that changed direction (DCh) in each of the remaining CPDAGs (rows).
Table A3. Proteogenomic analysis. The comparison of each of the 16 CPDAGs (rows) to all 16 reference CPDAGs (columns) is numerically displayed. The comparison is based on the structural Hamming distance (SHD) between each pair of CPDAGs, the number of bidirected edges in each reference CPDAG (columns) that remained bidirected (BiD) in each of the remaining CPDAGs (rows), the number of directed edges in each reference CPDAG (columns) that remained directed with the same direction (Dir) in each of the remaining CPDAGs (rows), the number of directed edges in each reference CPDAG (columns) that turned into bidirected (DtoB) in each of the remaining CPDAGs (rows), and the number of directed edges in each reference CPDAG (columns) that changed direction (DCh) in each of the remaining CPDAGs (rows).
CPDAGsMetricsAll15TT1T2T3T4T5T6T7T8T9T10T11T12T13T14T15
All15TSHD0327018791994211018972288169817111654180722452185241323442379
BiD144478410471029100210379671064108411111049960988940957954
Dir1072270524512462493437567542567531448461445423400
DtoB0216151145166182192161139118157193152166160178
DCh0866155685269655957546369737877
T1SHD3270036603679371337303875356335493570360937863733391039033824
BiD7841509718723712720683725736753722700718672666688
Dir270948221201206203194235203208193199186207188197
DtoB2310212197199191207210209192203202196192198208
DCh8605878636562686569846469726956
T2SHD1879366002654276225742971247924362439249928782842300830023008
BiD10477181488908893927845929953960929858876831827843
Dir524221950365342351302368356353383315326297292287
DtoB2042180177180206207211184192187214185206205191
DCh6158059605172807464567258726074
T3SHD1994367926540268527412948260525762567265029152842309730193026
BiD10297239081468912887867896924937892848873814829834
Dir512201365941336328313364340370360335338295305275
DtoB1791991870170193180199177161192187172201183205
DCh5578590625156656561565653666863
T4SHD2110371327622685026773034260626432619262329602956312430733107
BiD10027128939121483891837912913931913846850819823826
Dir462206342336928346315361344342354307303278280281
DtoB2062131911820201188193184172185203193210201197
DCh6863606206063775966526466726361
T5SHD1897373025742741267702811253525122533259028492873299529943092
BiD10377209278878911492867906926937903854862838832818
Dir493203351328346935324360339339343326315295280265
DtoB2162111992072090232230214196220213198211207216
DCh5265515160055635355485454626357
T6SHD2288387529712948303428110277828482790285430983113332832003253
BiD9676838458678378671477867863904852813822775801800
Dir437194302313315324963357336345329306277263265258
DtoB1972062011681801950197183151190200184197187193
DCh6962725663550726172645766696956
T7SHD1698356324792605260625352778022622167235826702669288328042783
BiD106472592989691290686714549771004939879898842861871
Dir567235368364361360357987409451410364356347345326
DtoB1902201941961752061930168150178205168197181195
DCh6568806577637205957606970665968
T8SHD1711354924362576264325122848226202231230727042725287528372807
BiD108473695392491392686397714751007968889892851868878
Dir542203356340344339336409934404397340315317296319
DtoB1882342041951842142011980161174201188213202196
DCh5965746559536159058546279716359
T9SHD1654357024392567261925332790216722310228026632663287327802802
BiD1111753960937931937904100410071520985907920865893889
Dir567208353370342339345451404909415347327311342310
DtoB1922322211942032311982011930193222200228191219
DCh5769646166557257580506658646155
T10SHD1807360924992650262325902854235823072280027742776290428542897
BiD10497229298929139038529399689851460861868844848856
Dir531193383360354343329410397415941359329315317297
DtoB1942271881801752022072031851620197195205197196
DCh5484565652486460545005957655556
T11SHD2245378628782915296028493098267027042663277403021314331013058
BiD9607008588488468548138798899078611473826805817828
Dir448199315335307326306364340347359964303289275282
DtoB2202121981901882162042061951821910201207195199
DCh6364725664545769626659059596562
T12SHD2185373328422842295628733113266927252663277630210319631383072
BiD9887188768738508628228988929208688261487806812836
Dir461186326338303315277356315327329303920300265262
DtoB2112111971821872062112052001942012060186196200
DCh6969585366546670795857590565278
T13SHD2413391030083097312429953328288328752873290431433196032993207
BiD9406728318148198387758428518658448058061470771805
Dir445207297295278295263347317311315289300962259274
DtoB1972002091841742002011961881881891991740194186
DCh7372726672626966716465595606761
T14SHD2344390330023019307329943200280428372780285431013138329903169
BiD9576668278298238328018618688938488178127711473799
Dir423188292305280280265345296342317275265259918266
DtoB2042112131811842122022001901671982001992050207
DCh7869606863636959636155655267069
T15SHD2379382430083026310730923253278328072802289730583072320731690
BiD9546888438348268188008718788898568288368057991503
Dir400197287275281265258326319310297282262274266910
DtoB2242202162041912262112171961972072161962052000
DCh7756746361575668595556627861690

References

  1. Liu, Y.; Beyer, A.; Aebersold, R. On the dependency of cellular protein levels on mRNA abundance. Cell 2016, 165, 535–550. [Google Scholar] [CrossRef] [PubMed]
  2. Buccitelli, C.; Selbach, M. mRNAs, proteins and the emerging principles of gene expression control. Nat. Rev. Genet. 2020, 21, 630–644. [Google Scholar] [CrossRef] [PubMed]
  3. Faulkner, S.; Dun, M.; Hondermarck, H. Proteogenomics: Emergence and promise. Cell. Mol. Life Sci. 2015, 72, 953–957. [Google Scholar] [CrossRef] [PubMed]
  4. Lazar, Ι.; Karcini, A.; Ahuja, S.; Estrada-Palma, C. Proteogenomic analysis of protein sequence alterations in breast cancer cells. Sci. Rep. 2019, 9, 10381. [Google Scholar] [CrossRef] [PubMed]
  5. Nesvizhskii, A. Proteogenomics: Concepts, applications and computational strategies. Nat. Methods 2014, 11, 1114–1125. [Google Scholar] [CrossRef] [PubMed]
  6. Low, Τ.; Mohtar, Μ.; Ang, Μ.; Jamal, R. Connecting Proteomics to Next-Generation Sequencing: Proteogenomics and Its Current Applications in Biology. Proteomics 2019, 19, 1800235. [Google Scholar] [CrossRef] [PubMed]
  7. Song, Y.C.; Das, D.; Zhang, Y.; Chen, M.X.; Fernie, A.R.; Zhu, F.Y.; Han, J. Proteogenomics-based functional genome research: Approaches, applications, and perspectives in plants. Trends Biotechnol. 2023, 41, 1532–1548. [Google Scholar] [CrossRef] [PubMed]
  8. Wang, P.; Wu, X.; Shi, Z.; Tao, S.; Liu, Z.; Qi, K.; Xie, Z.; Qiao, X.; Gu, C.; Yin, H.; et al. A large-scale proteogenomic atlas of pear. Mol. Plant 2023, 16, 599–615. [Google Scholar] [CrossRef]
  9. Chen, M.X.; Zhu, F.; Gao, B.; Ma, K.; Zhang, Y.; Fernie, A.; Chen, X.; Dai, L.; Ye, N.H.; Zhang, X.; et al. Full-length transcript-based proteogenomics of rice improves its genome and proteome annotation. Plant Physiol. 2020, 182, 1510–1526. [Google Scholar] [CrossRef]
  10. Dhar, Y.V.; Asif, M.H. Genome and transcriptome-wide study of carbamoyltransferase genes in major fleshy fruits: A multi-omics study of evolution and functional significance. Front. Plant Sci. 2022, 13, 994159. [Google Scholar] [CrossRef]
  11. Li, J.; Liu, L.; Le, T.D. Practical Approaches to Causal Relationship Exploration; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
  12. Gąsior, J.; Młyńczak, M.; Williams, C.; Popłonyk, A.; Kowalska, D.; Giezek, P.; Werner, B. The discovery of a data-driven causal diagram of sport participation in children and adolescents with heart disease: A pilot study. Front. Cardiovasc. Med. 2023, 10, 1247122. [Google Scholar] [CrossRef] [PubMed]
  13. Krethong, P.; Jirapaet, V.; Jitpanya, C.; Sloan, R. A causal model of health-related quality of life in Thai patients with heart-failure. J. Nurs. Scholarsh. 2008, 40, 254–260. [Google Scholar] [CrossRef] [PubMed]
  14. Tangkawanich, T.; Yunibhand, J.; Thanasilp, S.; Magilvy, K. Causal model of health: Health-related quality of life in people living with HIV/AIDS in the northern region of Thailand. Nurs. Health Sci. 2008, 10, 216–221. [Google Scholar] [CrossRef]
  15. Raghu, V.K.; Zhao, W.; Pu, J.; Leader, J.; Wang, R.; Herman, J.; Yuan, J.; Benos, P.; Wilson, D. Feasibility of lung cancer prediction from low-dose CT scan and smoking factors using causal models. Thorax 2019, 74, 643–649. [Google Scholar] [CrossRef]
  16. Shen, X.; Ma, S.; Vemuri, P.; Simon, G. Challenges and opportunities with causal discovery algorithms: Application to Alzheimer’s pathophysiology. Sci. Rep. 2020, 10, 2975. [Google Scholar] [CrossRef]
  17. Piccininni, M.; Konigorski, S.; Rohmann, J.; Kurth, T. Directed acyclic graphs and causal thinking in clinical risk prediction modeling. BMC Med. Res. Methodol. 2020, 20, 179. [Google Scholar] [CrossRef]
  18. Neto, E.; Ferrara, C.; Attie, A.; Yandell, B. Inferring causal phenotype networks from segregating populations. Genetics 2008, 179, 1089–1100. [Google Scholar] [CrossRef]
  19. Neto, E.; Keller, M.; Attie, A.; Yandell, B. Causal graphical models in systems genetics: A unified framework for joint inference of causal network and genetic architecture for correlated phenotypes. Ann. Appl. Stat. 2010, 4, 320. [Google Scholar]
  20. Zhang, X.; Zhao, X.; He, K.; Lu, L.; Cao, Y.; Liu, J.; Hao, J.; Liu, Z.; Chen, L. Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information. Bioinformatics 2012, 28, 98–104. [Google Scholar] [CrossRef]
  21. Wu, H.; Liu, X. Dynamic bayesian networks modeling for inferring genetic regulatory networks by search strategy: Comparison between greedy hill climbing and mcmc methods. Int. J. Comput. Inf. Eng. 2008, 2, 2585–2595. [Google Scholar]
  22. Vasimuddin, M.; Aluru, S. Parallel exact dynamic bayesian network structure learning with application to gene networks. In Proceedings of the 2017 IEEE 24th International Conference on High Performance Computing (HiPC), Jaipur, India, 18–21 December 2017. [Google Scholar]
  23. Wille, A.; Zimmermann, P.; Vranová, E.; Fürholz, A.; Laule, O.; Bleuler, S.; Bühlmann, P. Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana. Genome Biol. 2004, 5, R92. [Google Scholar] [CrossRef] [PubMed]
  24. Yu, J.; Smith, V.; Wang, P.; Hartemink, A.; Jarvis, E. Advances to Bayesian network inference for generating causal networks from observational biological data. Bioinformatics 2004, 20, 3594–3603. [Google Scholar] [CrossRef] [PubMed]
  25. Ram, R.; Chetty, M. A markov-blanket-based model for gene regulatory network inference. IEEE/ACM Trans. Comput. Biol. Bioinform. 2009, 8, 353–367. [Google Scholar] [CrossRef] [PubMed]
  26. Schadt, E.; Lamb, J.; Yang, X.; Zhu, J.; Edwards, S.; GuhaThakurta, D.; Lusis, A. An integrative genomics approach to infer causal associations between gene expression and disease. Nat. Genet. 2005, 37, 710–717. [Google Scholar] [CrossRef] [PubMed]
  27. Glymour, C.; Zhang, K.; Spirtes, P. Review of causal discovery methods based on graphical models. Front. Genet. 2019, 10, 524. [Google Scholar] [CrossRef]
  28. Skodra, C.; Michailidis, M.; Moysiadis, T.; Stamatakis, G.; Ganopoulou, M.; Adamakis, I.; Angelis, E.; Ganopoulos, I.; Tanou, G.; Samiotaki, M.; et al. Disclosing the molecular basis of salinity priming in olive trees using proteogenomic model discovery. Plant Physiol. 2022, 191, 1913–1933. [Google Scholar] [CrossRef]
  29. Boutsika, A.; Michailidis, M.; Ganopoulou, M.; Dalakouras, A.; Skodra, C.; Xanthopoulou, A.; Stamatakis, G.; Samiotaki, M.; Tanou, G.; Moysiadis, T.; et al. A wide foodomics approach coupled with metagenomics elucidates the enviromental signature of potatoes. iScience 2023, 26, 105917. [Google Scholar] [CrossRef]
  30. Ganopoulou, M.; Michailidis, M.; Angelis, L.; Ganopoulos, I.; Molassiotis, A.; Xanthopoulou, A.; Moysiadis, T. Could Causal Discovery in Proteogenomics Assist in Understanding Gene–Protein Relations? A Perennial Fruit Tree Case Study Using Sweet Cherry as a Model. Cells 2021, 11, 92. [Google Scholar] [CrossRef]
  31. Xanthopoulou, A.; Moysiadis, T.; Bazakos, C.; Karagiannis, E.; Karamichali, I.; Stamatakis, G.; Tanou, G. The perennial fruit tree proteogenomics atlas: A spatial map of the sweet cherry proteome and transcriptome. Plant J. 2022, 109, 1319–1336. [Google Scholar] [CrossRef]
  32. Alkio, M.; Jonas, U.; Declercq, M.; Van Nocker, S.; Knoche, M. Transcriptional dynamics of the developing sweet cherry (Prunus avium L.) fruit: Sequencing, annotation and expression profiling of exocarp-associated genes. Hortic. Res. 2014, 1, 11. [Google Scholar] [CrossRef]
  33. Berni, R.; Charton, S.; Planchon, S.; Romi, M.; Cantini, C.; Guerriero, G. Molecular investigation of Tuscan sweet cherries sampled over three years: Gene expression analysis coupled to metabolomics and proteomics. Hortic. Res. 2021, 8, 12. [Google Scholar] [CrossRef] [PubMed]
  34. Karagiannis, E.; Sarrou, E.; Michailidis, M.; Tanou, G.; Ganopoulos, I.; Bazakos, C.; Molassiotis, A. Fruit quality trait discovery and metabolic profiling in sweet cherry genebank collection in Greece. Food Chem. 2021, 342, 128315. [Google Scholar] [CrossRef] [PubMed]
  35. Michailidis, M.; Karagiannis, E.; Tanou, G.; Samiotaki, M.; Tsiolas, G.; Sarrou, E.; Molassiotis, A. Novel insights into the calcium action in cherry fruit development revealed by high-throughput mapping. Plant Mol. Biol. 2020, 104, 597–614. [Google Scholar] [CrossRef] [PubMed]
  36. Ganopoulou, M.; Moysiadis, T.; Gounaris, A.; Mittas, N.; Chatzopoulou, F.; Chatzidimitriou, D.; Sianos, G.; Vizirianakis, I.S.; Angelis, L. Single Nucleotide Polymorphisms’ Causal Structure Robustness within Coronary Artery Disease Patients. Biology 2023, 12, 709. [Google Scholar] [CrossRef] [PubMed]
  37. Langfelder, P.; Horvath, S. WGCNA: An R package for weighted correlation network analysis. BMC Bioinform. 2008, 9, 559. [Google Scholar] [CrossRef]
  38. Pearl, J. Causal diagrams for empirical research. Biometrika 1995, 82, 669–688. [Google Scholar] [CrossRef]
  39. Nagarajan, R.; Scutari, M.; Lèbre, S. Bayesian Networks in R; Springer: Berlin/Heidelberg, Germany, 2013; Volume 122, pp. 125–127. [Google Scholar]
  40. Tsamardinos, I.; Brown, L.; Aliferis, C. The max-min hill-climbing Bayesian network structure learning algorithm. Mach. Learn. 2006, 65, 31–78. [Google Scholar] [CrossRef]
  41. Neopolitan, R.E. Learning Bayesian Networks; Prentice Hall: Hoboken, NJ, USA, 2003. [Google Scholar]
  42. Spirtes, P.; Glymour, C.C.; Scheines, R. Causation, Prediction, and Search, 2nd ed.; The MIT Press: Cambridge, MA, USA, 2000. [Google Scholar]
  43. Tsagris, M.; Bordoudakis, G.; Lagani, V.; Tsamardinos, I. Constraint-based causal discovery with mixed data. Int. J. Data Sci. Anal. 2018, 6, 19–40. [Google Scholar] [CrossRef]
  44. Villar, L.; Lienqueo, I.; Llanes, A.; Rojas, P.; Perez, J.; Correa, F.; Sagredo, B.; Masciarelli, O.; Luna, V.; Almada, R. Comparative transcriptomic analysis reveals novel roles of transcription factors and hormones during the flowering induction and floral bud differentiation in sweet cherry trees (Prunus avium L. cv. Bing). PLoS ONE 2020, 15, e0230110. [Google Scholar] [CrossRef]
  45. Vimont, N.; Fouche, M.; Campoy, J.A.; Tong, M.; Arkoun, M.; Yvin, J.C.; Wigge, A.; Dirlewanger, E.; Cortijo, S.; Wenden, B. From bud formation to flowering: Transcriptomic state defines the cherry developmental phases of sweet cherry bud dormancy. BMC Genom. 2019, 20, 974. [Google Scholar] [CrossRef]
  46. Rothkegel, K.; Sandoval, P.; Soto, E.; Ulloa, L.; Riveros, A.; Lillo-Carmona, V.; Cáceres-Molina, J.; Almeida, A.; Meneses, C. Dormant but active: Chilling accumulation modulates the epigenome and transcriptome of Prunus avium during bud dormancy. Front. Plant Sci. 2020, 11, 1115. [Google Scholar] [CrossRef] [PubMed]
  47. Yang, H.; Tian, C.; Ji, S.; Ni, F.; Fan, X.; Yang, Y.; Sun, C.; Gong, H.; Zhang, A. Integrative analyses of metabolome and transcriptome reveals metabolomic variations and candidate genes involved in sweet cherry (Prunus avium L.) fruit quality during development and ripening. PLoS ONE 2021, 16, e0260004. [Google Scholar] [CrossRef] [PubMed]
  48. Michailidis, M.; Bazakos, C.; Kollaros, M.; Adamakis, I.D.S.; Ganopoulos, I.; Molassiotis, A.; Tanou, G. Boron stimulates fruit formation and reprograms developmental metabolism in sweet cherry. Physiol. Plant. 2023, 175, 13946. [Google Scholar] [CrossRef] [PubMed]
  49. Mergner, J.; Frejno, M.; List, M.; Papacek, M.; Chen, X.; Chaudhary, A.; Samaras, P.; Richter, S.; Shikata, H.; Messerer, M.; et al. Mass-spectrometry-based draft of the Arabidopsis proteome. Nature 2020, 579, 409–414. [Google Scholar] [CrossRef] [PubMed]
  50. Zhang, Y.; Chen, C.; Cui, Y.; Du, Q.; Tang, W.; Yang, W.; Kou, G.; Tang, W.; Chen, H.; Gong, R. Potential regulatory genes of light induced anthocyanin accumulation in sweet cherry identified by combining transcriptome and metabolome analysis. Front. Plant Sci. 2023, 14, 1238624. [Google Scholar] [CrossRef] [PubMed]
  51. Chen, C.; Chen, H.; Yang, W.; Li, J.; Tang, W.; Gong, R. Transcriptomic and metabolomic analysis of quality changes during sweet cherry fruit development and mining of related genes. Int. J. Mol. Sci. 2022, 23, 7402. [Google Scholar] [CrossRef]
  52. Sirangelo, T. Multi-omics approaches in the study of plants. Int. J. Adv. Res. Bot. 2019, 5, 1–7. [Google Scholar]
  53. Hasin, Y.; Seldin, M.; Lusis, A. Multi-omics approaches to disease. Genome Biol. 2017, 18, 83. [Google Scholar] [CrossRef]
  54. Mahmood, U.; Li, X.; Fan, Y.; Chang, W.; Niu, Y.; Li, J.; Qu, C.; Lu, K. Multi-omics revolution to promote plant breeding efficiency. Front. Plant Sci. 2022, 13, 1062952. [Google Scholar] [CrossRef]
Figure 1. Transcriptomic analysis. The comparison of each of the 16 CPDAGs (rows) to all 16 reference CPDAGs (columns) is displayed. Hierarchical clustering was performed by row. (A) SHDs, (B) The percentages of the directed edges in each of the reference causal structures that remained directed with the same direction in each of the 16 CPDAGs.
Figure 1. Transcriptomic analysis. The comparison of each of the 16 CPDAGs (rows) to all 16 reference CPDAGs (columns) is displayed. Hierarchical clustering was performed by row. (A) SHDs, (B) The percentages of the directed edges in each of the reference causal structures that remained directed with the same direction in each of the 16 CPDAGs.
Agronomy 14 00008 g001
Figure 2. Proteomic analysis. The comparison of each of the 16 CPDAGs (rows) to all 16 reference CPDAGs (columns) is displayed. Hierarchical clustering was performed by row. (A) SHDs, (B) The percentages of the directed edges in each of the reference causal structures that remained directed with the same direction in each of the 16 CPDAGs.
Figure 2. Proteomic analysis. The comparison of each of the 16 CPDAGs (rows) to all 16 reference CPDAGs (columns) is displayed. Hierarchical clustering was performed by row. (A) SHDs, (B) The percentages of the directed edges in each of the reference causal structures that remained directed with the same direction in each of the 16 CPDAGs.
Agronomy 14 00008 g002
Figure 3. Proteogenomic analysis. The comparison of each of the 16 CPDAGs (rows) to all 16 reference CPDAGs (columns) is displayed. Hierarchical clustering was performed by row. (A) SHDs, (B) The percentages of the directed edges in each of the reference causal structures that remained directed with the same direction in each of the 16 CPDAGs.
Figure 3. Proteogenomic analysis. The comparison of each of the 16 CPDAGs (rows) to all 16 reference CPDAGs (columns) is displayed. Hierarchical clustering was performed by row. (A) SHDs, (B) The percentages of the directed edges in each of the reference causal structures that remained directed with the same direction in each of the 16 CPDAGs.
Agronomy 14 00008 g003
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ganopoulou, M.; Xanthopoulou, A.; Michailidis, M.; Angelis, L.; Ganopoulos, I.; Moysiadis, T. Exploring the Robustness of Causal Structures in Omics Data: A Sweet Cherry Proteogenomic Perspective. Agronomy 2024, 14, 8. https://doi.org/10.3390/agronomy14010008

AMA Style

Ganopoulou M, Xanthopoulou A, Michailidis M, Angelis L, Ganopoulos I, Moysiadis T. Exploring the Robustness of Causal Structures in Omics Data: A Sweet Cherry Proteogenomic Perspective. Agronomy. 2024; 14(1):8. https://doi.org/10.3390/agronomy14010008

Chicago/Turabian Style

Ganopoulou, Maria, Aliki Xanthopoulou, Michail Michailidis, Lefteris Angelis, Ioannis Ganopoulos, and Theodoros Moysiadis. 2024. "Exploring the Robustness of Causal Structures in Omics Data: A Sweet Cherry Proteogenomic Perspective" Agronomy 14, no. 1: 8. https://doi.org/10.3390/agronomy14010008

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop